Luca M. Ghiringhelli1, Jan Vybiral2, Sergey V. Levchenko1,
Claudia Draxl3, and Matthias Scheffler1
1 Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin-Dahlem, Germany
2 Charles University, Department of Mathematical Analysis, Prague, Czech Republic
3 Humboldt-Universitaet zu Berlin, Institut fuer Physik and IRIS Adlershof, Berlin, Germany
Statistical learning of materials properties or functions so far starts with a largely silent, non-challenged step: the choice of the set of descriptive parameters (termed descriptor). However, when the scientific connection between the descriptor and the actuating mechanisms is unclear, causality of the learned descriptor-property relation is uncertain. Thus, trustful prediction of new promising materials, identification of anomalies, and scientific advancement are doubtful. We analyze this issue and define requirements for a suited descriptor. For a classical example, the energy difference of zincblende/wurtzite and rocksalt semiconductors, we demonstrate how a meaningful descriptor can be found systematically.
(*) Submitted for publication in Physical Review Letters. See also