Understanding Multiview and Self-Supervised Representation Learning: A Nonlinear Mixture Identification Perspective

Xiao Fu
Oregon State University

Central to representation learning is to succinctly represent high-dimensional data using the “essential information’’ while discarding the “redundant information”. Properly formulating and approaching this objective is critical to fending against overfitting, and can also benefit many important tasks such as domain adaptation and transfer learning. This talk aims to deepen understanding of representation learning and using the gained insights to come up with a new learning method. In particular, attention will be paid to two representation learning paradigms using multiple views of data, as both naturally acquired (e.g., image and audio) and artificially produced (e.g., via adding different noise to data samples) multiview data have empirically proven useful in producing essential information-reflecting vector representations. Natural views are often handled by multiview analysis tools, e.g., (deep) canonical correlation analysis [(D)CCA], while the artificial ones are frequently used in self-supervised learning (SSL) paradigms, e.g., BYOL and Barlow Twins. However, the effectiveness of these methods is mostly validated empirically, and more insights and theoretical underpinnings remain to be discovered. In this talk, an intuitive generative model of multiview data is adopted, where the views are different nonlinear mixtures of shared and private components. Since the shared components are view/distortion-invariant, such components may serve for representing the essential information of data in a non-redundant way. Under this model, a key module used in a suite of DCCA and SSL paradigms, namely, latent correlation maximization, is shown to guarantee the extraction of the shared components across views (up to certain ambiguities). It is further shown that the private information in each view can be provably disentangled from the shared using proper regularization design---which can facilitate tasks such cross-view translation and data generation. A finite sample analysis, which has been rare in nonlinear mixture identifiability study, is also presented. The theoretical results and newly designed regularization are tested on a series of tasks.


Back to Explainable AI for the Sciences: Towards Novel Insights