活水快报 - 42Digest

深线性神经网络的梯度流量方程:从网络角度进行调查

Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective

Joel Wendin and Claudio Altafini

arXiv

2025年11月13日

该论文调查了与深度线性神经网络相关的梯度流方程的动力学和损失图景的最新进展,即深度神经网络的梯度下降训练动力学(在步数大小达到0时的限制),缺少激活函数并受到二次损失函数的影响。当以神经网络的相邻矩阵来表述时,正如我们在论文中所做的,这些梯度流方程形成了一类趋同矩阵ODE,即nilpotent,polynomial,isospectral和具有保护定律。详细描述了损失景观。它的特点是无限多的全球最小值和马鞍点,既严格又不严格,但缺乏局部最小值和最大值。损失函数本身是梯度流的正半确定性Lyapunov函数,其水平集是无边界的瞬态临界点集,其临界值对应于梯度沿着一定轨迹所学的输入输出数据的奇数值。我们在论文中使用的相邻矩阵表示允许突出显示一个引数空间结构的存在,其中损失函数的每个临界值仅表示一次,而所有其他具有相同临界值的临界点都属于与引子空间相关联的光纤。它还允许在马鞍点轻松确定稳定和不稳定的子歧管,即使Hessian未能获得它们。

The paper surveys recent progresses in understanding the dynamics and loss landscape of the gradient flow equations associated to deep linear neural networks, i.e., the gradient descent training dynamics (in the limit when the step size goes to 0) of deep neural networks missing the activation functions and subject to quadratic loss functions. When formulated in terms of the adjacency matrix of the neural network, as we do in the paper, these gradient flow equations form a class of converging ma...

View Source