Deriving Equivalent Networks and hyperparameter optimization

Srikumar Ramalingam
University of Utah

In this first part, I will talk about deriving equivalent networks, which are networks that produce the same output for any given input. We study the possibility of exactly transforming a deep neural network to another equivalent network with a different number of layers and widths. In particular, we show that any feed-forward network with L-hidden layers that use rectified linear activation units (ReLUs) can be transformed into a 2-hidden layer shallow network. While such a transformation normally requires an exponential number of neurons, we show that this can be done using relatively fewer neurons by establishing a connection to linear regions in piecewise linear networks that use ReLU activation functions. Using mixed integer programming solvers, we also identify neurons that are stably inactive for deriving equivalent networks with loss-less model compression.
In the second part, I will show some results for lossy model compression of deep residual networks. In particular, we discard redundant layers with marginal or no loss in performance. This is achieved by using a modified ResNet architecture, with a few additional ReLUs, that examines and discards deep residual layers if they produce very small responses. I will conclude by showing compelling vision applications in the context of autonomous driving using deep residual networks: semantic boundary detection, 3D registration of point clouds, and image-based localization.

Back to Long Programs