活水快报 - 42Digest

NaturalThoughts：面向通用推理任务的选择与蒸馏推理轨迹

NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks

Yang Li, Youssef Emad, Karthik Padthe, Jack Lanchantin, Weizhe Yuan, Thao Nguyen, Jason Weston, Shang-Wen Li, Dong Wang, Ilia Kulikov, Xian Li

arXiv

2025年7月2日

近期研究表明，通过监督微调从大型教师模型中蒸馏推理轨迹，其效果优于仅使用小型学生模型进行强化学习(Guo et al. 2025)。然而，关于教师模型的何种推理示范最能有效提升学生模型推理能力，目前尚缺乏系统研究。本工作中，我们从NaturalReasoning(Yuan et al. 2025)的大量问题中精选高质量"NaturalThoughts"推理轨迹。我们首先系统分析了影响推理能力蒸馏的因素，包括通用推理任务的样本效率和可扩展性。研究发现，简单地通过随机采样扩大数据规模就能获得稳定的性能提升。此外，我们发现选择需要更多样化推理策略的困难样本，能更高效地迁移教师模型的推理能力。在Llama和Qwen模型上的评估表明，使用NaturalThoughts训练在通用STEM推理基准(包括GPQA-Diamond、MMLU-Pro和SuperGPQA)上优于OpenThoughts、LIMO等现有推理数据集。

Recent work has shown that distilling reasoning traces from a larger teacher model via supervised finetuning outperforms reinforcement learning with the smaller student model alone (Guo et al. 2025). However, there has not been a systematic study of what kind of reasoning demonstrations from the teacher are most effective in improving the student model's reasoning capabilities. In this work we curate high-quality "NaturalThoughts" by selecting reasoning traces from a strong teacher model based o...

计算与语言

View Source