活水快报 - 42Digest

多模态对比学习适应共享潜在变量的内在维度

Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables

Yu Gui, Cong Ma, Zongming Ma

arXiv

2025年5月18日

多模态对比学习作为一种自我监督的表示学习技术,在基础模型训练(如CLIP <cit.>)方面取得了巨大成功。在本文中,我们研究从多模态对比学习中学习表示的理论属性,超越线性表示和特定数据分布。我们的分析表明,通过温度优化,多模态对比学习不仅可以最大化模式之间的相互信息,还可以适应数据的内在维度,而数据的内部维度可以远低于表示向量的用户指定维度。合成和真实世界数据集的实验展示了对比学习学习低维和信息化表示的能力,弥合了理论见解和实践表现。

Multi-modal contrastive learning as a self-supervised representation learning technique has achieved great success in foundation model training, such as CLIP <cit.>. In this paper, we study the theoretical properties of the learned representations from multi-modal contrastive learning beyond linear representations and specific data distributions. Our analysis reveals that, enabled by temperature optimization, multi-modal contrastive learning not only maximizes mutual information between modaliti...

View Source