Abstract
Stochastic Optimization and Sparse Statistical Recovery: An Optimal Algorithm in High Dimensions
Alekh Agarwal
Microsoft Research
We develop and analyze stochastic optimization algorithms for
problems in which the expected loss is strongly convex, and the optimum is
(approximately) sparse. Previous approaches are able to exploit only one of
these two structures, yielding an $order(pdim/T)$ convergence rate for
strongly convex objectives in $pdim$ dimensions, and an
$order(sqrt{(spindex log pdim)/T})$ convergence rate when the optimum
is $spindex$-sparse. Our algorithm is based on successively solving a
series of $ell_1$-regularized optimization problems using Nesterov's dual
averaging algorithm. We establish that the error of our solution after $T$
iterations is at most $order((spindex logpdim)/T)$, with natural
extensions to approximate sparsity. Our results apply to locally Lipschitz
losses including the logistic, exponential, hinge and least-squares losses.
By recourse to statistical minimax results, we show that our convergence
rates are optimal up to multiplicative constant factors. The effectiveness
of our approach is also confirmed in numerical simulations, in which we
compare to several baselines on a least-squares regression problem.
[Based on joint work with Sahand Negahban and Martin Wainwright]
problems in which the expected loss is strongly convex, and the optimum is
(approximately) sparse. Previous approaches are able to exploit only one of
these two structures, yielding an $order(pdim/T)$ convergence rate for
strongly convex objectives in $pdim$ dimensions, and an
$order(sqrt{(spindex log pdim)/T})$ convergence rate when the optimum
is $spindex$-sparse. Our algorithm is based on successively solving a
series of $ell_1$-regularized optimization problems using Nesterov's dual
averaging algorithm. We establish that the error of our solution after $T$
iterations is at most $order((spindex logpdim)/T)$, with natural
extensions to approximate sparsity. Our results apply to locally Lipschitz
losses including the logistic, exponential, hinge and least-squares losses.
By recourse to statistical minimax results, we show that our convergence
rates are optimal up to multiplicative constant factors. The effectiveness
of our approach is also confirmed in numerical simulations, in which we
compare to several baselines on a least-squares regression problem.
[Based on joint work with Sahand Negahban and Martin Wainwright]
No video available