How FAIR are data repositories in materials science?

Claudia Draxl

Claudia Draxl1,2, Peter Wittenburg3, and Matthias Scheffler2,1
1Humboldt-Universität zu Berlin, Berlin, Germany
2Fritz Haber Institute of the Max Planck Society, Berlin, Germany
3Max-Planck Computing and Data Facility, Garching, Germany

The growth of data from simulations and experiments is expanding beyond a level that is addressable by established methods. The “4 V challenge” of Big Data –Volume (the amount of data), Variety (the heterogeneity of form and meaning of data), Velocity (the rate at which data may change or new data arrive), and Veracity (uncertainty of quality) – is clearly becoming eminent also in materials science. Controlling this data sets the stage for explorations and discoveries. Novel data-analytics tools can find patterns and correlations in data that can’t be seen by a human eye. In fact, data-driven materials research is adding a new research paradigm to our scientific landscape. However, without a proper data infrastructure that allows for collecting and sharing data, data-driven materials science will be hampered. What is now known in this context as FAIR Guiding Principles [1], means: Data are Findable for anyone interested; they are stored in a way that they are easily Accessible; their representation follows accepted standards, and all specifications are open – hence data are Interoperable. All this enables that data can be used for research questions that can be different from the purpose they have been created for; hence data are Re-purposable / Re-usable.

So, how FAIR are materials science data and their repositories?
Concerning the F & A, I will discuss predominantly the main data collections in computational materials science. Most crucial for the I but also for the R and F, are (i) a proper description of the data, i.e., the metadata, (ii) a unified representation as well as (iii) the assessment of data quality [2]. The NOMAD (Novel Materials Discovery) Laboratory [3] addresses all these issues [4]. In fact, the concept behind the NOMAD Repository [5], that was opened up in 2014, was developed independently and parallel to the FAIR Principles [1].

[1] Mark D. Wilkinson, et al., Sci. Data 3, 160018 (2016).
[2] C. Carbogno,K.S. Thygesen, B. Bieniek, C. Draxl, L. Ghiringhelli, A. Gulans, O.T. Hofmann, K. W. Jacobsen, S. Lubeck, J.J. Mortensen, M. Strange, Numerical Quality Control for DFT-based Materials Databases, preprint.
[4] C. Draxl and M. Scheffler, NOMAD: The FAIR Concept for Big-Data-Driven Materials Science, MRS Bulletin 43, 676 (2018).

Presentation (PDF File)

Back to Workshop II: HPC and Data Science for Scientific Discovery