A growing challenge: what are we going to do with data generated in exascale simulations?

Vasily V. Bulatov
Lawrence Livermore National Lab

Looking back at the development of multi-scale simulation methods over the last 30 years, one observes that, at least in a narrow sub-domain of simulation sciences focused on materials under extreme conditions, smart multi-scale methods we have worked on so enthusiastically over the years are yet to prove truly enabling. Moreover, where previously existed a wide scale gap between fully atomistic and meso-scale simulations, the same gap has been steadily narrowing or has closed in a number of simulation contexts owing to the ever growing computational capabilities. Where already feasible, direct MD simulations of materials are superseding multi-scale simulations, a trend likely to continue into the next “exascale decade”. This brings new opportunities and new challenges. As an opportunity, MD and mesoscale simulations performed on the same length- and time-scales and under identical conditions expose behaviors and mechanisms present in a fully resolved atomistic simulation yet missing or incorrectly accounted for in a counterpart mesoscale simulation, a practice we refer to as cross-scale (X-scale) comparison and matching. At the same time, atomistic simulations generate enormous amounts of trajectory data, e.g. a few googles of data in just one simulation day on Sierra pre-exascale supercomputer. Recording and storing such data streams far exceeds present state-of-the art I/O and disk capacities resulting in an irrevocable loss of nearly all of the potentially informative trajectory data. And when indeed recorded, even a tiny subset of the trajectory data can be overwhelmingly large for post-processing. Can we – humans and/or machines – meaningfully learn from exascale data streams?

Presentation (PDF File)

Back to Long Programs