What is the role of memory in learning and optimization? The optimal convergence rates (measures in terms of the number of oracle queries or samples needed) for various optimization problems are achieved by computationally expensive optimization techniques, such as second-order methods and cutting-plane methods. We will discuss if simpler, faster and memory-limited algorithms such as gradient descent can achieve these optimal convergence rates for the prototypical optimization problem of minimizing a convex function with access to a gradient or a stochastic gradient oracle. Our results hint at a perhaps curious dichotomy---it is not possible to significantly improve on the convergence rate of known memory efficient techniques (which are linear-memory variants of gradient descent for many of these problems) without using substantially more memory (quadratic memory for many of these problems). Therefore memory could be a useful discerning factor to provide a clear separation between 'efficient' and 'expensive' techniques. Finally, we also discuss how exploring the landscape of memory-limited optimization sheds light on new problem structures where it is possible to circumvent our lower bounds, and suggests new variants of gradient descent.
Based on joint works with Annie Marsden, Aaron Sidford, Gregory Valiant, Jonathan Kelner and Honglin Yuan.