42digest首页
序列多指数模型和深度注意力网络学习的基本极限:高维渐近和尖锐的阈值

Fundamental limits of learning in sequence multi-index models and deep attention networks: High-dimensional asymptotics and sharp thresholds

Emanuele Troiani, Hugo Cui, Yatin Dandi, Florent Krzakala, Lenka Zdeborová

arXiv
2025年2月2日

在本手稿中,我们研究了深度注意力神经网络的学习,该网络被定义为多个自我关注层的组成,具有并列和低等级权重。 我们首先建立此类模型的映射,以对多索引模型进行排序,将广泛研究的多索引模型泛化到顺序协方差,为此我们建立了许多一般结果。 在贝叶斯最优学习的背景下,在大尺寸D和可数数量样本N的极限中,我们对最佳性能以及该设置中最著名的多项式时间算法(即近似消息传递)的性能进行了尖锐的渐近表征,并表征了优于随机预测性能所需的最小样本复杂性的尖锐阈值。 我们的分析特别揭示了不同层是如何依次学习的。 最后,我们讨论如何在现实的设置中也可以观察到这种顺序学习。

In this manuscript, we study the learning of deep attention neural networks, defined as the composition of multiple self-attention layers, with tied and low-rank weights. We first establish a mapping of such models to sequence multi-index models, a generalization of the widely studied multi-index model to sequential covariates, for which we establish a number of general results. In the context of Bayesian-optimal learning, in the limit of large dimension D and commensurably large number of sampl...