Adaptive Gradient (AdaGrad) and Generalized Accelerated Gradient Ascent (GAGA): two sides of the same coin?

Yoram Singer
Google Inc.

In the talk I will review two popular algorithms that are used in numerous large
scale machine learning applications at Google. The first is a stochastic
optimization technique that reshapes on-the-fly its gradients so as to
accomodate the geometry of the examples. The second is a simple yet very
effective variation on Nesterov's accelerated gradient method that uses simple
curvature estimates to reshape its heavy balls into heavy ellipses. The end
result in both cases are algorithms that perform awesomely well in learning
from massive high-dimensional datasets when the examples are very sparse.


Back to Stochastic Gradient Methods