活水快报 - 42Digest

揭发深度伪造:利用深度伪造语音检测的增强功能和功能

Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection

Inbal Rimon, Oren Gal, Haim Permuter

arXiv

2025年1月9日

随着生成式音频技术的不断进步,Deepfake语音检测带来了越来越大的挑战。我们提出了一个混合训练框架,通过新颖的增强策略提高检测性能。首先,我们引入了一种双级掩码方法,该方法在光谱图级别(MaskedSpec)和潜在特征空间(MaskedFeature)内运行,提供互补的正则化,可以提高对局部失真的耐受性并增强广义学习。其次,我们在自我监督期间引入压缩感知策略,以增加低资源场景的可变性,同时保持学习表示的完整性,从而提高预训练功能对深度伪造检测的适用性。该框架将可学习的自我监督特征提取器与ResNet分类头集成在统一的训练管道中,使声学表征和判逐模式能够共同适应。在ASVspoof5挑战赛(第1轨)上,该系统在封闭条件下以4.08%的相等错误率(EER)实现最先进的结果,通过将模型与各种预训练特征提取器融合,进一步降至2.71%。在ASVspoof2019上训练时,我们的系统在ASVspoof2019评估集(0.18% EER)和ASVspoof2021 DF任务(2.92%)上获得领先性能。

Deepfake speech detection presents a growing challenge as generative audio technologies continue to advance. We propose a hybrid training framework that advances detection performance through novel augmentation strategies. First, we introduce a dual-stage masking approach that operates both at the spectrogram level (MaskedSpec) and within the latent feature space (MaskedFeature), providing complementary regularization that improves tolerance to localized distortions and enhances generalization l...

声音处理音频与语音处理

View Source