TMDC: A Two-Stage Modality Denoising and Complementation Framework for Multimodal Sentiment Analysis with Missing and Noisy Modalities
Yan Zhuang, Minhao Liu, Yanru Zhang, Jiawen Deng, Fuji Ren
多模态情绪分析(MSA)旨在通过整合文本、音频和视频等多种模式的信息来推断人类情绪。 然而,在现实世界中,缺失模式和嘈杂信号的存在严重阻碍了现有模型的稳健性和准确性。 虽然以前的工作在这些问题上取得了进展,但通常孤立地处理它们,限制了实际环境中的整体有效性。 为了共同减轻缺失和嘈杂模式带来的挑战,我们提出了一个名为“两阶段模式去噪和互补”(TMDC)的框架。 TMDC包括两个连续训练阶段。 在模式内消声阶段,使用专用的去噪模块从完整数据中提取去噪特定和模式共享表示,减少了噪音的影响,增强了代表性的鲁棒性。 在模式间互补阶段,利用这些陈述来弥补缺失的模式,从而丰富现有信息,进一步提高稳健性。 对MOSI,MOSEI和IEMOCAP的广泛评估表明,与现有方法相比,TMDC始终如一地实现卓越的性能,从而建立了新的最先进的结果。
Multimodal Sentiment Analysis (MSA) aims to infer human sentiment by integrating information from multiple modalities such as text, audio, and video. In real-world scenarios, however, the presence of missing modalities and noisy signals significantly hinders the robustness and accuracy of existing models. While prior works have made progress on these issues, they are typically addressed in isolation, limiting overall effectiveness in practical settings. To jointly mitigate the challenges posed b...