活水快报 - 42Digest

WebSailor：面向网络代理的超人推理导航系统

WebSailor: Navigating Super-human Reasoning for Web Agent

Kuan Li, Zhongwang Zhang, Huifeng Yin, Liwen Zhang, Litu Ou, Jialong Wu, Wenbiao Yin, Baixuan Li, Zhengwei Tao, Xinyu Wang, Weizhou Shen, Junkai Zhang, Dingchu Zhang, Xixi Wu, Yong Jiang, Ming Yan, Pengjun Xie, Fei Huang, Jingren Zhou

arXiv

2025年7月3日

突破人类认知限制是LLM训练的关键前沿。诸如DeepResearch等商业代理系统已在BrowseComp等极端复杂的信息检索基准上展现出超人能力，这是此前无法实现的成就。我们认为其成功关键在于开源模型所缺乏的复杂推理模式：在浩瀚信息空间中系统性地降低极端不确定性的能力。基于此洞见，我们提出WebSailor——一套完整的训练后方法体系，旨在注入这种关键能力。我们的方法包括：通过结构化采样和信息模糊化生成新型高不确定性任务、RFT冷启动机制，以及高效的代理式强化学习训练算法DUPO（复制采样策略优化）。通过这一完整流程，WebSailor在复杂信息检索任务中显著超越所有开源代理系统，达到商业代理系统的性能水平，弥合了能力差距。

Transcending human cognitive limitations represents a critical frontier in LLM training. Proprietary agentic systems like DeepResearch have demonstrated superhuman capabilities on extremely complex information-seeking benchmarks such as BrowseComp, a feat previously unattainable. We posit that their success hinges on a sophisticated reasoning pattern absent in open-source models: the ability to systematically reduce extreme uncertainty when navigating vast information landscapes. Based on this i...

计算与语言人工智能

View Source