For large scale machine learning problems, second order approximation is not practical, and so first order (gradient-based) optimization methods are used. These consist of gradient descent, Stochastic gradient descent, and their accelerated version. In this tutorial we will review the basic ideas of the algorithms, the convergence results, and the connections with ordinary differential equations.
Back to Long Programs