活水快报 - 42Digest

SpikCommander:具有多视图学习的高性能尖峰变压器,实现高效的语音命令识别

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

Jiaqi Wang, Liutao Yu, Xiongri Shen, Sihang Guo, Chenlin Zhou, Leilei Zhao, Yi Zhong, Zhiguo Zhang, Zhengyu Ma

arXiv

2025年11月11日

尖峰神经网络(SNN)通过利用其事件驱动的处理范式,为节能语音命令识别(SCR)提供了一条有希望的途径。然而,由于有限的时间建模和基于二进制的尖峰表示,现有的基于SNN的SCR方法通常难以从语音中捕获丰富的时间依赖性和上下文信息。为了应对这些挑战,我们首先介绍了多视图尖刻时间感知自注意(MSTASA)模块,该模块将有效的尖刻时间感知注意力与多视图学习框架相结合,以模拟语音命令中的互补时间依赖关系。在MSTASA的基础上,我们进一步提出了SpikCommander,这是一种完全尖峰驱动的变压器架构,将MTASA与尖峰上下文细化通道MLP(SCR-MLP)集成在一起,共同增强时间上下文建模和通道智能功能集成。我们在三个基准数据集上评估我们的方法:Spiking Heidelberg Dataset(SHD)、Spiking Speech Commands(SSC)和Google Speech Commands V2(GSC)。广泛的实验表明,SpikCommander在可比时间步骤下的参数较少的情况下,一直优于最先进的(SOTA)SNN方法,突出了其有效性和效率,以实现强大的语音命令识别。

Spiking neural networks (SNNs) offer a promising path toward energy-efficient speech command recognition (SCR) by leveraging their event-driven processing paradigm. However, existing SNN-based SCR methods often struggle to capture rich temporal dependencies and contextual information from speech due to limited temporal modeling and binary spike-based representations. To address these challenges, we first introduce the multi-view spiking temporal-aware self-attention (MSTASA) module, which combin...

声音处理机器学习

View Source