Learning Over-parameterized Neural Networks: From Neural Tangent Kernel to Mean-field Analysis

Quanquan Gu
University of California, Los Angeles (UCLA)
Computer Science

Deep learning has achieved tremendous successes in many applications. However, why deep learning is so powerful remains less well understood. A recent line of research on deep learning theory focuses on the extremely over-parameterized setting, and shows that deep neural networks learned by (stochastic) gradient descent enjoy nice optimization and generalization guarantees in the so-called neural tangent kernel (NTK) regime. However, many have argued that existing NTK-based results cannot explain the success of deep learning, mainly due to two reasons: (i) most results require an extremely wide neural network, which is impractical, and (ii) NTK-based analysis requires the network parameters to stay very close to initialization throughout training, which does not match empirical observation. In this talk, I will explain how these limitations in the current NTK-based analyses can be alleviated. In the first part of this talk, I will show that under certain assumptions, we can prove optimization and generalization guarantees with network width polylogarithmic in the training sample size and inverse target test error. In the second part of this talk, I will introduce a mean-field analysis in a generalized neural tangent kernel regime, and show that noisy gradient descent with weight decay can still exhibit a “kernel-like” behavior. Our analysis allows the network parameters trained by noisy gradient descent to be far away from initialization. Our results push the theoretical analysis of over-parameterized deep neural networks towards a more practical setting.

Presentation (PDF File)

Back to Workshop II: PDE and Inverse Problem Methods in Machine Learning