PDE Approaches for Deep Learning

Stanley Osher
University of California, Los Angeles (UCLA)

Joint work with many people, especially Bao Wang
Recently, links between partial differential equations (PDEs) and DNNs have been established in several interesting directions. We used ideas from Hamilton-Jacobi (HJ) equations and control and differential games to improve training time, modify and
improve the training algorithm,
We propose a very simple modification of gradient descent and stochastic gradient descent. We show that when applied to a variety of machine learning models including softmax regression, convolutional neural nets, generative adversarial nets, differential privacy and deep reinforcement learning, this very simple surrogate can dramatically reduce the variance and also improve the accuracy. The new algorithm, (which depends on one nonnegative parameter) when applied to non convex minimization, tends to avoid local minima. We also present a simple connection between transport equations and deep residual nets, based on stochastic control. This connection enabled us to improve neural nets’ adversarial robustness and generalization accuracy. Again, the programming changes needed to do these improvements are minimal, in cost, complexity and effort.

Presentation (PDF File)

Back to Workshop III: Geometry of Big Data