A Lyapunov analysis for accelerated gradient methods: From deterministic to stochastic case

Maxime Laborde
McGill University

Su, Boyd and Cand├ęs showed that Nesterov's method for accelerated gradient descent can be obtained as the discretization of an Ordinary Differential Equation (ODE). By discretizing a modified ODE we show: (i) with a constant learning rate, we obtain Nesterov's method, (ii) with a decreasing learning rate, we obtain an accelerated stochastic gradient descent algorithm. We prove, using a Lyapunov function approach, that for both the convex and strongly convex cases, the algorithms converge at the optimal rate for the last iterate of SGD, with rate constants which are better than previously available.

Presentation (PDF File)

Back to Workshop II: PDE and Inverse Problem Methods in Machine Learning