Recent progress in deep neral networks has been base in part on
optimizing "dropout performance". One optimizes the expected performance
when a large fraction of network parameters are randomly set to zero. The
optimization is typically done with a form of stochastic gradient descent.
Intuitively, a neural network that is robust to dropout
perturbations should have better generalization properties --- it
should perform better on novel inputs. Stochastic model perturbation
is the fundamental concept underlying PAC-Bayesian generalization
theory. This talk will briefly summarize PAC-Bayesian generalization
theory and give a regularization bound for a simple form of dropout
training as a straightforward application.