Accuracy of Data

I’ve just been made aware of this project. I work with crystal materials for optical applications. I need to ask a simple question and putting it bluntly; why are the results that I look up consistently incorrect? Let’s just consider density. Checking CaF2, BaF2 and Ge and Si the calculated densities are always too low. The ‘Help’ tab gives a clue why they are low by about 3% and that seems true, so my question then morphs into why are they low? If I move on to other things I’m interested in like Poisson Ratio, I find that I cannot trust anything I see here. I am sure I’m missing a vital point, but can someone please put me straight? Thanks. Keith M

Hi @Keith_Matthews , the accuracy of DFT calculations for different properties is a complicated topic. Due to the scale of the computed data we serve and the diversity of chemical spaces that we cover (basically the entire periodic table), we choose good “general purpose” calculation methods that balance accuracy and efficiency, and that means that specific properties for specific materials can occasionally be quite inaccurate.

In general, the value in MP data is the fact that it enables large comparisons over many materials across chemical space. That’s made possible because all the calculations are done in a consistent way, so even when there are errors w.r.t. experiment, the errors are typically (but not always) systematic. If you are looking for the most quantitatively accurate property data available for specific material systems, our database might not be the best place to look.

I’d encourage you to read the Methodology section in our documentation for more information on this. With respect to density specifically, others are more knowledgeable than me but it is known that PBE (the DFT functional we use in our calculations) underbinds many systems and results in lattice constants that are a few percent too large (hence the density disrepancy you mentioned).

We are in the process of transitioning to a methodology that uses a newer and usually much more accurate DFT functional. See this preprint for details.


Hi @Keith_Matthews, welcome!

Honestly this is an excellent question, and very important. We wrote a brief editorial that covers some of these issues.

@rkingsbury’s response is exactly right – the true value, due to systematic errors, is in the data set in aggregate. The Materials Project database has been successful in screening to find materials for new applications, leading to successful synthesis and characterization. With that said, we do ourselves no favors if we falsely present the data as the most accurate available.

There are a few key factors at play here:

  1. This is computational data. Some familiarity with the methods is essential to understand the systematic errors present (especially in band gaps / excited states, but also in lattice parameters as you observed). Most errors are systematic in nature, but there are additional errors for some materials, perhaps due to the fundamental physics at play, or because of a numerical issue. For the latter, we do what we can to catch these, but this kind of calculation-at-scale is difficult (Materials Project was pioneering here, but this is by no means a solved problem).

  2. These are not the best computational methods available. We are constantly walking a tightrope between feasibility (performing the calculations at scale with the computing budget we have available) and accuracy (the latest and greatest methods). The best advice here is, for every predicted property (such as elastic constants), to refer to the underlying peer-reviewed publication, which details what experimental validation we perform and can give a sense of the accuracy of the data.

  3. Development of better computational methods are also limited by the amount of experimental data available. If we could have more full elastic tensors (not just bulk moduli) or high-quality calorimetry for more materials, it would be incredibly useful. But these kinds of experiments are often typically difficult, time-consuming, and not rewarded as they should be in our current scientific climate, and so we’re limited in the amount of validation we can do.

Ultimately, we have to start somewhere. The hope with Materials Project is that by building this machinery to do calculations of materials at scale that, over the years, we can iteratively improve the quality of our predictions as new methods become more accessible. Indeed, as @rkingsbury mentions, this is now happening with our new calculations with the SCAN functional, which will improve both our predictions of formation enthalpy and lattice parameters.

Hope this helps,


1 Like

Thank you both. I appreciate your answers. Your explanations match with the conclusions that I had arrived at for myself. I just needed to check my thinking was correct. One of my responsibilities for my company is presenting the most reliable physical data that I can for the range of optical materials we offer. Not easy where some materials have been in use for 100+ years and the original references to commonly accepted data are lost in history. My interest in MP came from discussions with a customer who introduced me to it when we were having a discussion about the stiffness coefficients C11 etc.

I’ll follow the project with interest. Good luck for the future.


1 Like