Optimal rates for zero-order optimization: the power of two function evaluations

John Duchi
University of California, Berkeley (UC Berkeley)

We consider derivative-free algorithms for stochastic and non-stochastic optimization problems that use only function values rather than gradients. Focusing on non-asymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for d-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most \sqrt{d} in convergence rate over traditional stochastic gradient methods. We establish such results for both smooth and non-smooth cases, sharpening previous analyses that suggested a worse dimension dependence. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, establishing the sharpness of our achievable results up to constant factors.

Based on joint work with Michael Jordan, Martin Wainwright, and Andre Wibisono

URL: http://arxiv.org/abs/1312.2139

Back to Stochastic Gradient Methods