Scalable algorithms for optimal experimental design for large-scale Bayesian inverse problems governed by complex models

Omar Ghattas
University of Texas at Austin

Bayesian inference provides a systematic and rational framework for learning complex physical models from complex observational data. In the resulting Bayesian inverse problem (BIP), we employ observational data and a model mapping parameters to observations to infer unknown parameters and quantify their uncertainty, in the form of a posterior probability distribution. But this begs the question: how do we choose the data in the first place? The problem of choosing the optimal training data for a BIP can be formulated as the following optimal experimental design (OED) problem: optimize the design of a data acquisition system (sensor locations, what is measured and how often, etc.) to minimize some measure of the uncertainty in the inferred parameters.

This constitutes a challenge of the highest order. First, the parameters to be inferred in the inverse problem are often infinite dimensional spatially correlated fields, whose discretization yields high dimensional parameter spaces that can number in the millions. Second, the physical models to be learned are often complex PDE systems that are computationally expensive to solve. Third, the BIP---difficult as it is---is merely an inner problem within the outer OED problem, to be solved at each optimization iteration. And fourth, efficient optimization algorithms for the OED problem require gradients of the OED objective function with respect to the design parameters, meaning that we must differentiate through the BIP solution.

An attractive objective function for OED is the expected information gain (EIG), which can be understood as the expected Kullback--Leibler divergence between prior and posterior. Naive evaluation of EIG is intractable for large-scale problems due to the large number of samples required in double-loop Monte Carlo. Here we invoke an approximation of the EIG based on the Laplace approximation of the posterior. Each evaluation of the objective function then requires computing a sample average approximation (over possible realizations of the data) of the information gain (IG) between the Laplace approximation and the Gaussian prior distribution. Since the IG involves the KL divergence between two Gaussians, it can be expressed in closed form, in particular in terms of the log-determinant and trace of the prior-preconditioned posterior covariance operator. A randomized eigensolver allows us to estimate these invariants at a cost---measured in number of forward PDE solves---that is independent of the dimension of the parameter space, resulting in scalable evaluation of the OED objective. Variational adjoint methods are then used to efficiently compute the gradient of the OED objective function with respect to the experimental design variables, thus leading to an efficient and scalable Laplace-approximation based OED method. Moreover, the resulting method inherits the fine-grained parallel efficiency of the forward PDE solver, despite being tightly coupled. We present numerical results demonstrating the scalability and efficiency of this approach. In particular, the results demonstrate that the number of required PDE solves is independent of the parameter dimension, and only weakly dependent on the data
dimension.

This work is joint with Umberto Villa.


Back to Workshop II: HPC and Data Science for Scientific Discovery