Exploring multi-million compound spaces with chemical accuracy using machine learning

Heather Kulik
Massachusetts Institute of Technology

I will discuss our efforts to use machine learning (ML) to accelerate the computational tailoring and design of transition metal complexes and metal-organic framework (MOF) materials in spaces of millions to tens of millions of materials. One limitation in a challenging materials space such as open shell, 3d transition metal chemistry is that ML models and ML-accelerated high-throughput screening traditionally rely on density functional theory (DFT) for data generation, but DFT is both computationally demanding and prone to errors that limit its accuracy in predicting new materials. I will describe three ways we’ve overcome these limitations: i) through efficient global optimization to minimize the numbers of calculations carried out to obtain design rules in weeks instead of decades while satisfying multiple objectives regarding electronic structure calculation validity; ii) through machine-learned consensus from dozens of DFT functionals to more robustly uncover new materials; and iii) through the development of a density functional "recommender" that identifies the most accurate mean field theory for a given compound. Time permitting, I will also describe how we have leveraged natural language processing to extract, learn, and directly predict experimental measures of stability on heterogeneous MOF materials.

Presentation (PDF File)

Back to Long Programs