Multiscale analysis of accelerated gradient methods in machine learning

Mohammad Farazmand
North Carolina State University
Mathematics

Accelerated gradient descent iterations are widely used in optimization and, in particular, in machine learning. It is known that, in the continuous-time limit, these iterations converge to a second-order differential equation which we refer to as the accelerated gradient flow. Using geometric singular perturbation theory, we show that, under certain conditions, the accelerated gradient flow possesses an attracting invariant slow manifold to which the trajectories of the flow converge asymptotically. We obtain a general explicit expression in the form of functional series expansions that approximates the slow manifold to any arbitrary order of accuracy. To the leading order, the accelerated gradient flow reduced to this slow manifold coincides with the usual gradient descent. We illustrate the implications of our results on three examples.

Presentation (PDF File)

Back to Workshop III: Validation and Guarantees in Learning Physical Models: from Patterns to Governing Equations to Laws of Nature