活水快报 - 42Digest

DOA在LLM-Aided模拟声学场景中使用轻量级网络

DOA Estimation with Lightweight Network on LLM-Aided Simulated Acoustic Scenes

Haowen Li, Zhengding Luo, Dongyuan Shi, Boxiang Wang, Junwei Ji, Ziyi Yang and Woon-Seng Gan

arXiv

2025年11月11日

到达方向(DOA)估计在空间音频和声学信号处理中至关重要,在现实世界中具有广泛的应用。大多数现有的DOA模型都是通过将清洁语音与室脉冲响应(RIR)相结合来训练合成数据,这限制了其由于有限的声学多样性而具有的可推广性。在本文中,我们使用最近引入的大型语言模型(LLM)构建的数据集来重新审视DOA估计,该数据集提供了更逼真和多样化的空间音频场景。我们在这个数据集上对几种基于神经的DOA方法进行了基准测试,并提出了LightDOA,这是一种基于深度可分离卷积的轻量级DOA估计模型,专门用于不同环境中的mutil-channel输入。实验结果表明,LightDOA在各种声学场景中实现了令人满意的准确性和稳健性,同时保持较低的计算复杂性。这项研究不仅强调了在LLM的帮助下合成的空间音频在推进稳健高效的DOA估计研究方面的潜力,而且还强调了LightDOA作为资源受限应用的高效解决方案。

Direction-of-Arrival (DOA) estimation is critical in spatial audio and acoustic signal processing, with wide-ranging applications in real-world. Most existing DOA models are trained on synthetic data by convolving clean speech with room impulse responses (RIRs), which limits their generalizability due to constrained acoustic diversity. In this paper, we revisit DOA estimation using a recently introduced dataset constructed with the assistance of large language models (LLMs), which provides more ...

声音处理人工智能

View Source