Linear programming and sparse inference constraints are amongst the most well-studied optimization and estimation problems, where geometry, statistics and combinatorics interact in profound ways. In this talk, we will describe two instances where these problems appear naturally in the analysis of neural networks.
First, we focus on the problem of overparametrised learning with a shallow ReLU network. This problem is NP-hard in the worst-case, but can be solved in practice with gradient-based learning for typical instances. We will present a framework that puts these two results in harmony by reducing the overparametrised learning problem to a finite-dimensional linear program, leveraging convex geometry.
Next, we look at the problem of sparse inference from a computational perspective. Although the L1-relaxed problem is a linear program that can be efficiently solved with interior-point methods, in practice iterative shrinkage algorithms such as (F)Ista are employed. By unrolling such algorithms one formally obtains a deep, narrow network that can be optimized at a fixed depth (Gregor, LeCun). We study the question whether depth is necessary for such sparse inference tasks by studying approximation lower bounds using shallow and wide networks instead.
Joint work with Jaume de Dios (UCLA) and Luca Venturi (Courant)