Exploring multi-million compound spaces with chemical accuracy using machine learning

Heather Kulik
Massachusetts Institute of Technology

I will discuss our efforts to use machine learning (ML) to accelerate the computational tailoring and design of transition metal complexes and metal-organic framework (MOF) materials in spaces of millions to tens of millions of materials. One limitation in a challenging materials space such as open shell, 3d transition metal chemistry is that ML models and ML-accelerated high-throughput screening traditionally rely on density functional theory (DFT) for data generation, but DFT is both computationally demanding and prone to errors that limit its accuracy in predicting new materials. I will describe three ways we’ve overcome these limitations: i) through efficient global optimization to minimize the numbers of calculations carried out to obtain design rules in weeks instead of decades while satisfying multiple objectives regarding electronic structure calculation validity; ii) through machine-learned consensus from dozens of DFT functionals to more robustly uncover new materials; and iii) through the development of a density functional "recommender" that identifies the most accurate mean field theory for a given compound. Time permitting, I will also describe how we have leveraged natural language processing to extract, learn, and directly predict experimental measures of stability on heterogeneous MOF materials.

Presentation (PDF File)

Back to Workshop I: Increasing the Length, Time, and Accuracy of Materials Modeling Using Exascale Computing