In many deep learning scenarios more network parameters than training examples are used. In such situations often several networks can be found that exactly interpolate the data. This means that the used learning algorithm induces an implicit bias on the chosen network.
This talk will discuss the nature of such implicit bias for gradient descent algorithms in the simplified setting of linear network, i.e., deep matrix factorizations. Numerical experiments and first theoretical works suggest that the product of the gradient descent iterates converges, i.e., the linear network, converges to a matrix of low rank. We present a rigorous theoretical results for a further simplified matrix estimation scenario. In particular,
we give a precise analysis of the dynamics of the effective rank of the iterates.
We discuss a number of open problems and possible extensions to learning low rank tensor decompositions.
Back to Workshop IV: Efficient Tensor Representations for Learning and Computational Complexity