On the existence of wide flat minima in neural network landscapes: analytic and algorithm approaches

Carlo Baldassi
Bocconi University

The techniques currently used for training neural networks are often very effective at avoiding overfitting and find solutions that generalize well, even when applied to very complex architectures in an overparametrized regime. This phenomenon is currently poorly understood. Building on a framework that we have been developing in recent years, based on a large-deviation statistical physics analysis, we have studied analytically, numerically and algorithmically the structural properties of simplified models in relation to the existance and accessibility of so-called "wide flat minima" of the loss function. We have investigated the effect of the ReLU transfer function and of the cross-entropy loss function, contrasted these devices with others that don't exhibit the same phenomena, and developed message-passing and greedy local-search algorithms that exploit the analytical findings.

Presentation (PDF File)

Back to Workshop IV: Using Physical Insights for Machine Learning