Thermodynamic limits for neural networks

Andrea Montanari
Stanford University

Modern neural network architectures often comprise thousands of hidden units, and
millions of weights that are trained via gradient descent (GD) or stochastic gradient descent (SGD).
In physics, systems with a large number of degrees of freedom often admit a simplified (macroscopic)
description. Is there an analogous macroscopic description of the dynamics of multi-layer neural networks?
I will focus on the case of two-layers (one-hidden-layer) fully connected networks, and will discuss two
specific ways to take the large system limit. These mathematical constructions capture two regimes
of the learning process:
1) The lazy regime, in which the network essentially behave as a linear random features model;
2) The mean field regime, in which the network follows a genuinely non-linear dynamics and learns good
representations of the data.

I will compare the two regimes, and discuss for which learning tasks we expect to see a separation between them.
[Based on joint work with Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz]

Presentation (PDF File)

Back to Long Programs