Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
Shanbo Cheng, Yu Bao, Zhichao Huang, Yu Lu, Ningxin Peng, Lu Xu, Runsheng Yu, Rong Cao, Yujiao Du, Ting Han, Yuxiang Hu, Zeyang Li, Sitong Liu, Shengtao Ma, Shiguang Pan, Jiongchen Xiao, Nuo Xu, Meng Yang, Rong Ye, Yiming Yu, Jun Zhang, Ruofei Zhang, Wanyi Zhang, Wenhao Zhu, et al.
同声传译(SI)是翻译行业最具挑战性的领域之一,产品级自动系统长期面临诸多难题:转录和翻译质量欠佳、缺乏实时语音生成能力、多说话人混淆问题,以及在长篇话语中翻译语音膨胀现象。本研究推出Seed-LiveInterpret 2.0,这是一个端到端同声传译模型,具备高保真、超低延迟的语音到语音生成能力,并支持语音克隆功能。作为完全可用的产品级解决方案,Seed-LiveInterpret 2.0通过我们创新的双工语音理解-生成框架直接应对这些挑战。实验结果表明,通过大规模预训练和强化学习,该模型在翻译准确性和延迟之间实现了显著更好的平衡,经专业口译员验证其性能超过70%。
Simultaneous Interpretation (SI) represents one of the most daunting frontiers in the translation industry, with product-level automatic systems long plagued by intractable challenges: subpar transcription and translation quality, lack of real-time speech generation, multi-speaker confusion, and translated speech inflation, especially in long-form discourses. In this study, we introduce Seed-LiveInterpret 2.0, an end-to-end SI model that delivers high-fidelity, ultra-low-latency speech-to-speech...