Recent Advances in SPSA at the two Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems

James Spall
Johns Hopkins University
Applied Physics Laboratory

The talk will discuss two key extensions to the basic simultaneous perturbation stochastic approximation (SPSA) algorithm. Note that while SPSA in its basic form is not formally a stochastic gradient method (i.e., it is based on noisy objective function measurements, as opposed to unbiased gradient measurements), it is in the same general family of “first-order” stochastic approximation methods. This talk will present recent results that deal with problems at the extremes in some sense: SPSA for adaptive estimation in smooth problems, where we wish to obtain a Hessian estimate, and SPSA-type ideas in fully discrete problems.
Relative to the smooth case, it is known that a stochastic analogue of the deterministic Newton-Raphson algorithm provides an asymptotically optimal or near-optimal form of search for optimization (or root-finding). However, directly determining the required Hessian matrix for optimization has often been difficult or impossible in practice. We review state-of-the-art methods for solving the optimization problem of interest while simultaneously estimating the Hessian matrix.
For the non-smooth case, we introduce the middle point discrete simultaneous perturbation stochastic approximation (DSPSA) algorithm for the optimization of a noisy loss function defined on a multi-dimensional grid of points in Euclidean space. It can be shown that the sequence generated by DSPSA converges to the optimal point under some conditions. Further, the rate of convergence for DSPSA can be analyzed by solving for an upper bound of the mean squared error of the generated sequence. The error bound allows for a formal comparison of the performance of DSPSA with other discrete algorithms such as the stochastic ruler algorithm and the stochastic comparison algorithm.

Acknowledgment: The discrete work is joint with Dr. Qi Wang of Johns Hopkins University.

Presentation (PDF File)

Back to Stochastic Gradient Methods