Global Convergence of Four-Layer Matrix Factorization under Random Initialization
Minrui Luo, Weihang Xu, Xiang Gao, Maryam Fazel, Simon Shaolei Du
深度矩阵因子化问题的梯度下降动力学被广泛研究为深度神经网络的简化理论模型。 虽然两层矩阵因子化的收敛理论已经确立,但迄今为止尚未建立随机初始化下一般深度矩阵因子化的全局收敛保证。 为了解决这一差距,我们提供了一个多项式时间全局收敛保证,用于在四层矩阵因子化上随机初始化梯度下降,给定目标矩阵上的某些条件和标准平衡正则化术语。 我们的分析采用新技术来显示梯度体面动力学的马鞍避免特性,并扩展了以前的理论,以表征层权重的特征值变化。
Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer...