Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Ziqian Zhang, Min Huang, Zhongzhe Xiao
语音识别(SER)为了深度学习方法而显著进步,而文本信息进一步增强了其性能。 然而,很少有研究集中在语音制作过程中的生理信息,其中还包括说话特征,包括情绪状态。 为了弥补这一差距,我们进行了一系列实验,以调查SER的语音激发信息和发音运动学的潜力。 由于为此目的缺乏训练数据,我们引入了一个描绘的情感数据集STEM-E2VA,其中包括电球学(EGG)和电磁诠学(EMA)等音频和生理数据。 EGG和EMA分别提供语音激发和发音运动学的信息。 此外,我们使用通过语音反转方法得出的估计生理数据进行情绪识别,而不是收集的EGG和EMA,以探索在实际SER中应用此类生理信息的可行性。 实验结果证实了将有关 SER 语音生产的生理信息纳入其中的有效性,并展示了其在现实场景中实际使用的潜力。
Speech emotion recognition (SER) has advanced significantly for the sake of deep-learning methods, while textual information further enhances its performance. However, few studies have focused on the physiological information during speech production, which also encompasses speaker traits, including emotional states. To bridge this gap, we conducted a series of experiments to investigate the potential of the phonation excitation information and articulatory kinematics for SER. Due to the scarcit...