Numerical Approximation Algorithms for Big Data

Dongbin Xiu
Ohio State University

One of the central tasks in scientific computing is to accurately approximate unknown target functions. This is typically done with the help of data — samples of the unknown functions. In statistics this falls into the realm of regression and machine learning. In mathematics, it is the central theme of approximation theory. The emergence of Big Data presents both opportunities and challenges. On one hand, big data introduces more information about the unknowns and, in principle, allows us to create more accurate models. On the other hand, data storage and processing become highly challenging. Moreover, data often contain certain corruption errors, in addition to the standard noisy errors. In this talk, we present some new developments regarding certain aspects of big data approximation. More specifically, we present numerical algorithms that address two issues: (1) how to automatically eliminate corruption/biased errors in data; and (2) how to create accurate approximation models in very high dimensional spaces using stream/live data, without the need to store the entire data set. We present both the numerical algorithms, which are easy to implement, as well as rigorous analysis for their theoretical foundation.

Back to Workshop III: Data Assimilation, Uncertainty Reduction, and Optimization for Subsurface Flow