Stochastic gradient-based optimization algorithms play perhaps the most important role in modern machine learning, in particular, deep learning. Nesterov accelerated gradient (NAG) is a celebrated technique to accelerate gradient descent, however, the NAG technique will fail in stochastic gradient descent (SGD).
In this talk, I will discuss some recent progress in leveraging NAG and restart techniques to accelerate SGD. Also, I will discuss how to leverage momentum to design deep neural nets in a mathematically mechanistic manner. This is joint work with Tan Nguyen, Richard Baraniuk, Andrea Bertozzi, and Stan Osher.