Machine-learning for materials and physics discovery through symbolic regression and kernel methods

Richard Hennig
University of Florida

Machine-learning can provide surrogate models that aid the search for new materials and new analytic equations that describe physical relationships. We will present an example for each type of learning: (i) the learning of surrogate kernel methods for the accelerated exploration of energy landscapes and (ii) the data-driven discovery of the functional form of the superconducting critical temperature
First, we present our kernel approach to developing surrogate machine learning models for energy prediction. Using structurally and compositionally diverse materials generated with our genetic algorithm package GASP and their formation energies from density functional theory, we train interatomic potentials using support vector regression. We show that radial and angular distribution functions can efficiently encode relevant physical information into machine-readable inputs and obey required constraints. We demonstrate how augmenting the training data with local energies improves model performance. These models can filter low-value candidates, reducing the computational cost of the genetic algorithm by eliminating materials with a high probability of having higher energy [1].
Second, predicting the critical temperature, Tc, of superconductors is a notoriously difficult task, even for electron-phonon systems. We build on earlier efforts by McMillan and Allen and Dynes to model Tc from various measures of the phonon spectrum and the electron-phonon interaction by using machine learning algorithms. Specifically, we use symbolic regression building on the SISSO framework to identify a new, physically interpretable equation for Tc as a function of a small number of physical quantities. We show that our model, trained using the relatively small data tested by Allen and Dynes, improves upon the Allen-Dynes fit and can reasonably generalize to superconducting materials with higher Tc such as H3S and LaH10. By incorporating physical insights and constraints into a data-driven approach, we demonstrate that machine-learning methods can identify the relevant physical quantities and obtain predictive equations using small but high-quality datasets [2].

[1] S. Honrao, B. E. Anthonio,? R. Ramanathan, J. J. Gabriel, and R. G. Hennig, Comp. Mater. Sci. 158, 414 (2019), https://arxiv.org/abs/1905.06780.
[2] S. R. Xie, G. R. Stewart, J. J. Hamlin, P. J. Hirschfeld, and R. G. Hennig, arXiv:1905.06780 (2019), https://doi.org/10.1016/j.commatsci.2018.08.041.

Stephen Xie,1,2 Shreyas Honrao,1,3 and Richard G. Hennig1,2,3
Department of Materials Science and Engineering, University of Florida, Gainesville, Florida
Quantum Theory Project, University of Florida, Gainesville, Florida
Department of Materials Science and Engineering, Cornell University, Ithaca, New York


Back to Workshop I: From Passive to Active: Generative and Reinforcement Learning with Physics