活水快报 - 42Digest

用于用于学习星系演化和宇宙学的图像光谱协会的多模式蒙面自动编码器

Multi-Modal Masked Autoencoders for Learning Image-Spectrum Associations for Galaxy Evolution and Cosmology

Morgan Himes, Samiksha Krishnamurthy, Andrew Lizarraga, Srinath Saikrishnan, Vikram Seenivasan, Jonathan Soriano, Ying Nian Wu, Tuan Do

arXiv

2025年10月26日

即将到来的调查将产生数十亿个星系图像,但相对较少的光谱,激励学习跨模态表示的模型。我们构建了134,533个星系图像(HSC-PDR2)和光谱(DESI-DR1)的数据集,并调整了多模态遮罩自动编码器(MMAE),将图像和光谱嵌入共享表示中。 MMAE是一个基于变压器的架构,我们通过掩盖75%的数据和重建缺失的图像和光谱令牌来训练。我们使用此模型来测试三种应用:从大量掩蔽的数据中重建光谱和图像重建,以及仅从图像中重新移回归。它恢复关键的物理特征,如星系形状,原子发射线峰值和宽阔的连续斜坡,尽管它与精细的图像细节和线条强度作斗争。对于红移回归,MMAE在预测散射方面的表现与以前的多模态模型相比或更好,即使在测试中缺少光谱时也是如此。这些结果突出了世外物理中蒙面自动编码器的潜力和局限性,并激励扩展了基础模型的其他模式,如文本。

Upcoming surveys will produce billions of galaxy images but comparatively few spectra, motivating models that learn cross-modal representations. We build a dataset of 134,533 galaxy images (HSC-PDR2) and spectra (DESI-DR1) and adapt a Multi-Modal Masked Autoencoder (MMAE) to embed both images and spectra in a shared representation. The MMAE is a transformer-based architecture, which we train by masking 75% of the data and reconstructing missing image and spectral tokens. We use this model to tes...

天体物理学仪器与方法星系天体物理学机器学习

View Source