Virtual Talk: Extracting Insights from Complex Data: Constrained Multimodal Data Mining using Coupled Matrix and Tensor Factorizations

Evrim Acar
Simula Research Laboratory
Computer Science

In order to understand complex systems such as the human metabolome (i.e., the complete set of small biochemical compounds in the body) or the brain, the system should be recorded using different sensing technologies. This creates a surge of data at an unprecedented complexity, and main pillars of the complexity are heterogeneous multimodal data sets, some of which evolve in time while others are static. For instance, measurements of blood samples collected at multiple time points form a dynamic metabolomics data set showing how metabolites change in time. These measurements can be arranged as a multiway array (also referred to as a higher-order tensor) with one of the modes corresponding to time, e.g., subjects by metabolites by time. Tensor factorizations have proved useful in terms of revealing the underlying patterns from such higher-order tensors, and, through the analysis of dynamic metabolomics data, can reveal the underlying mechanisms, their dynamics as well as subject group differences. Joint analysis of dynamic metabolomics data and other data sets such as genetics or gut microbiome data (often in the form of subjects by features matrices) holds the promise to provide a more complete picture of the underlying system and reveal better subject stratifications. Therefore, there is an emerging need to jointly analyze such heterogeneous multimodal data sets and capture the underlying patterns in an interpretable way. Tensor factorizations have been extended to joint analysis of data from multiple sources through coupled matrix and tensor factorizations (CMTF). While CMTF-based methods are effective for multimodal data mining, there are various challenges, in particular, in terms of capturing the underlying patterns and their evolution in time. In this talk, we first introduce a flexible algorithmic framework relying on Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM) in order to facilitate the use of a variety of constraints, loss functions and couplings with linear transformations when fitting CMTF models. Numerical experiments on simulated and real data demonstrate that the proposed AO-ADMM-based approach is accurate, flexible and computationally efficient with comparable or better performance than available CMTF algorithms. We then discuss the extension of the framework to joint analysis of dynamic and static data sets by incorporating alternative tensor factorization approaches, which have shown promising performance in terms of revealing evolving patterns in temporal data analysis.
This talk will be based on the following papers and ongoing work in the TrACEr project (https://tracer.simulamet.no/):
C. Schenker, X. Wang, and E. Acar. PARAFAC2-based Coupled Matrix and Tensor Factorizations, arXiv:2210.13054, 2022
M. Roald, C. Schenker, V. D. Calhoun, T. Adali, R. Bro, J. E. Cohen, and E.Acar. An AO-ADMM Approach to Constraining PARAFAC2 on All Modes, SIAM Journal on Mathematics of Data Science, 4(3): 1191-1222, 2022
C. Schenker, J. E. Cohen, and E. Acar. A Flexible Optimization Framework for Regularized Matrix-Tensor Factorizations with Linear Couplings, IEEE Journal of Selected Topics in Signal Processing, 15(3): 506-521, 2021

Presentation (PDF File)

Back to Explainable AI for the Sciences: Towards Novel Insights