We consider the problem of end-to-end learning, i.e., going from data to decisions via multiple stages of computation that are jointly optimized. Standard stochastic gradient-based training fails if some of the computations are discrete and hence non-differentiable. This challenge arises, e.g., in structured prediction, where we seek to learn to parameterize, e.g., via a neural net, instances of an optimization problem whose solutions agree with our data. It also arises in the architectural design of deep neural networks with structured attention mechanisms. We show that for a rich class of discrete computations modeled via submodular function optimization, one can efficiently probabilistically smooth the discrete objective in order to obtain gradients for end-to-end training. Our approaches exploit polyhedral and algorithmic properties of submodular functions in order to provably preserve the semantics of the discrete computations. I will present the basic methodological ideas and discuss several applications.
Back to Workshop IV: New Architectures and Algorithms