High-throughput Data and New representations for Models and Machine Learning

Gus Hart
Brigham Young University

Efforts to leverage computational materials science to impact meaningful materials discovery are driving rapid growth in materials data. This data represents a significant investment in both research time and infrastructure costs. The data already has intrinsic worth because it contains new materials, some fraction of which will be candidates for new technology. Direct searching of this data has already yielded some discoveries. But to really capitalize on the investment and utilize the full potential of the data, one must be able to effectively explore composition and structure space, a
*vastly* larger space than the space of the data. In other words, we must find a way to effectively interpolate (in composition and structure space) between data points. Physically-motivated models (e.g., DFT-trained classical potentials, cluster expansions, etc.) or machine learning approaches may be effective ways to achieve this interpolation. I explore several new ideas for representing crystal structure and chemical composition in a moderately coarse-grained way that may form a representation amenable to machine learning or physical model building. The key ideas are graph theory, graph isomorphisms, and invariant representations.

Presentation (PDF File)

Back to Machine Learning for Many-Particle Systems