Stochastic Gradient Descent with Importance Sampling

Rachel Ward
University of Texas at Austin
Mathematics

In this talk we provide improved estimates on the exponential convergence rate of stochastic gradient descent for smooth strongly convex objectives, in the regime where all stochastic estimates share an optimum and so such an exponential rate is possible. Next, we show that by incorporating importance sampling -- perturbing the uniform row selection rule in the direction of sampling estimates proportionally to the Lipschitz constant of their gradients -- the convergence rate of SGD can be improved dramatically from depending on the average squared condition number to depending on the average (unsquared) condition number of the system.
This is joint work with Deanna Needell and Nati Srebro.


Back to Long Programs