Abstract

Optimization Methods for Large Scale Distributed Deep Learning

Rio Yokota
Tokyo Institute of Technology

As deep neural networks increase in size, the amount of data and time to train them become prohibitively large to handle on a single compute node. Distributed deep learning on thousands of GPUs forces the mini-batch stochastic descent methods to operate in a regime where the increasing batch size starts to have detrimental effect on the convergence and generalization. We investigate the possibility of using second order optimization methods with proper regularization as an alternative to conventional stochastic gradient decent methods.

Back to Long Programs

Optimization Methods for Large Scale Distributed Deep Learning

Rio Yokota
Tokyo Institute of Technology

Programs

News & Research

People

About IPAM

Optimization Methods for Large Scale Distributed Deep Learning

Rio YokotaTokyo Institute of Technology

Programs

News & Research

People

About IPAM

Rio Yokota
Tokyo Institute of Technology