University of Texas at Austin

Bayesian inference provides a systematic and rational framework for

learning complex physical models from complex observational data. In

the resulting Bayesian inverse problem (BIP), we employ observational

data and a model mapping parameters to observations to infer unknown

parameters and quantify their uncertainty, in the form of a posterior

probability distribution. But this begs the question: how do we choose

the data in the first place? The problem of choosing the optimal

training data for a BIP can be formulated as the following optimal

experimental design (OED) problem: optimize the design of a data

acquisition system (sensor locations, what is measured and how often,

etc.) to minimize some measure of the uncertainty in the inferred

parameters.

This constitutes a challenge of the highest order. First, the

parameters to be inferred in the inverse problem are often infinite

dimensional spatially correlated fields, whose discretization yields

high dimensional parameter spaces that can number in the

millions. Second, the physical models to be learned are often complex

PDE systems that are computationally expensive to solve. Third, the

BIP---difficult as it is---is merely an inner problem within the outer

OED problem, to be solved at each optimization iteration. And fourth,

efficient optimization algorithms for the OED problem require

gradients of the OED objective function with respect to the design

parameters, meaning that we must differentiate through the BIP

solution.

An attractive objective function for OED is the expected information

gain (EIG), which can be understood as the expected Kullback--Leibler

divergence between prior and posterior. Naive evaluation of EIG is

intractable for large-scale problems due to the large number of

samples required in double-loop Monte Carlo. Here we invoke an

approximation of the EIG based on the Laplace approximation of the

posterior. Each evaluation of the objective function then requires

computing a sample average approximation (over possible realizations

of the data) of the information gain (IG) between the Laplace

approximation and the Gaussian prior distribution. Since the IG

involves the KL divergence between two Gaussians, it can be expressed

in closed form, in particular in terms of the log-determinant and

trace of the prior-preconditioned posterior covariance operator. A

randomized eigensolver allows us to estimate these invariants at a

cost---measured in number of forward PDE solves---that is independent

of the dimension of the parameter space, resulting in scalable

evaluation of the OED objective. Variational adjoint methods are then

used to efficiently compute the gradient of the OED objective function

with respect to the experimental design variables, thus leading to an

efficient and scalable Laplace-approximation based OED

method. Moreover, the resulting method inherits the fine-grained

parallel efficiency of the forward PDE solver, despite being tightly

coupled. We present numerical results demonstrating the scalability

and efficiency of this approach. In particular, the results

demonstrate that the number of required PDE solves is independent of

the parameter dimension, and only weakly dependent on the data

dimension.

This work is joint with Umberto Villa.