A Stochastic Quasi-Newton Method for Large Scale Learning

Jorge Nocedal
Northwestern University

We present a quasi-Newton method that operates in the stochastic approximation regime. It is designed for the minimization of convex stochastic functions, and is efficient, robust and scalable. In contrast to previous attempts at designing stochastic quasi-Newton methods, our approach does not employ differences of gradients, but instead gathers curvature information pointwise, and feeds this information into the well known BFGS formula, which is entrusted with the task of computing an inverse Hessian approximation. We present numerical results on text classification and speech recognition problems.

Presentation (PowerPoint File)

Back to Stochastic Gradient Methods