Dropout is a simple yet effective regularization technique that has been applied to various machine learning tasks, including linear classification, matrix factorization and deep learning. However, the theoretical properties of dropout as a regularizer remain quite elusive. This talk will present a theoretical analysis of dropout for single hidden-layer linear neural networks. We demonstrate that dropout is a stochastic gradient descent method for minimizing a certain regularized loss. We show that the regularizer induces solutions that are low-rank, in the sense of minimizing the number of neurons. We also show that the global optimum is balanced, in the sense that the product of the norms of incoming and outgoing weight vectors of all the hidden nodes equal. Finally, we provide a complete characterization of the optimization landscape induced by dropout.
Back to Workshop IV: Deep Geometric Learning of Big Data and Applications