NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Yang Li, Youssef Emad, Karthik Padthe, Jack Lanchantin, Weizhe Yuan, Thao Nguyen, Jason Weston, Shang-Wen Li, Dong Wang, Ilia Kulikov, Xian Li
近期研究表明,通过监督微调从大型教师模型中蒸馏推理轨迹,其效果优于仅使用小型学生模型进行强化学习(Guo et al. 2025)。然而,关于教师模型的何种推理示范最能有效提升学生模型推理能力,目前尚缺乏系统研究。本工作中,我们从NaturalReasoning(Yuan et al. 2025)的大量问题中精选高质量"NaturalThoughts"推理轨迹。我们首先系统分析了影响推理能力蒸馏的因素,包括通用推理任务的样本效率和可扩展性。研究发现,简单地通过随机采样扩大数据规模就能获得稳定的性能提升。此外,我们发现选择需要更多样化推理策略的困难样本,能更高效地迁移教师模型的推理能力。在Llama和Qwen模型上的评估表明,使用NaturalThoughts训练在通用STEM推理基准(包括GPQA-Diamond、MMLU-Pro和SuperGPQA)上优于OpenThoughts、LIMO等现有推理数据集。
Recent work has shown that distilling reasoning traces from a larger teacher model via supervised finetuning outperforms reinforcement learning with the smaller student model alone (Guo et al. 2025). However, there has not been a systematic study of what kind of reasoning demonstrations from the teacher are most effective in improving the student model's reasoning capabilities. In this work we curate high-quality "NaturalThoughts" by selecting reasoning traces from a strong teacher model based o...