High order methods in empirical risk minimization

Alejandro Ribeiro
University of Pennsylvania

Empirical risk minimization (ERM) problems express optimal classifiers as solutions of optimization problems in which the objective is the sum of a very large number of sample costs. Established approaches to solve ERM rely on computing stochastic gradient directions by accessing a single summand at each iteration. Despite the efficiency of individual iterations, these methods can be slow to converge and have convergence rates that are linear at best. In this talk we discuss approaches to adapt Newton and quasi-Newton methods for ERM problems. In the incremental quasi-Newton method we exploit memory to store curvature approximation matrices. We show that these curvature approximations succeed in approximating the Hessian and thereby lead to superlinear convergence. In the Adaptive Newton method we consider subsets of training samples that are augmented geometrically by a factor of two. Each time the training set is augmented we perform a single Newton step. We show that it is possible to achieve statistical accuracy with just two passes over the dataset.

Presentation (PDF File)

Back to Emerging Wireless Networks