Is it OK to post snapshots of Materials Project structures on e.g. FigShare (i.e. for manuscript reproducibility)?

From the terms of use page:

Your use of this web site is subject to the following terms and conditions:

This site is operated by Lawrence Berkeley National Laboratory (LBNL). By using this site, you agree to abide by the terms of the LBNL Privacy and Security notice and the Materials Project Privacy Policy. You also acknowledge that the data in this database is computed, and may not be accurate enough for your application. You agree not to hold the developers, contributors, and hosts of the Materials Project liable for any accuracies in the data, or consequences thereof. You agree to not scrape the website directly. You may collect large fractions of the database for analysis via the Materials Project API and to present processed results with proper attribution. If you plan to download large datasets, email [email protected] with the email address associated with your account and with your use case so that we can avoid flagging your account as abusing the service. We may also suggest an efficient way for you to obtain the data you need.

License

By downloading Content from Materials Project, User agrees to accept the Creative Commons Attribution 4.0 License implying that the Content may be copied, distributed, transmitted, and adapted, without obtaining specific permission from the Materials Project, provided proper attribution is given to the Materials Project.

The data on the Materials Project has been created at Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231 with the U.S. Department of Energy, supplemented by spectroscopic data under created by the Data Infrastructure Building Blocks (DIBBS) Local Spectroscopy Data Infrastructure (LSDI) project funded by National Science Foundation (NSF), under Award Number 1640899.

The U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the data, or allow others to do so, for U.S. Government purposes.

I’m guessing the answer is yes based on matminer including snapshots, but maybe care needs to be taken not to post the CIF files directly? For example, saving as pymatgen Structure objects (or equivalent) before saving the snapshot?

Any other considerations that need to be taken into account?

Hi @sgbaird,

The answer is yes, based on the terms of our license, and this (in my view) is essential for scientific reproducibility.

However, we do want to avoid a situation whereby researchers start using out-of-date datasets from different places, rather than using the canonical up-to-date data available on Materials Project. We can and do detect errors and issues with the data that is corrected over time, and this is partly why we encourage users to cite the specific database version used when publishing research. If a user starts pulling MP data from figshare or elsewhere, rather than MP directly, for a follow-on work then this would not be ideal.

Separately, I have also encountered a situation where someone has snapshotted MP data, and then someone else has cited the snapshot as the origin of the data rather than citing MP.

Therefore, out of politeness, I would suggest to include a warning that the data is snapshotted for the purposes of reproducibility, but that the latest data should be retrieved directly from the Materials Project, and that use of the data should include a Materials Project citation. (Indeed, this is the “BY” requirement of the “CC BY 4.0” license).

Hope this helps and sounds reasonable!

Matt

1 Like

To follow up, I would certainly save the Structure object rather than the CIF, but simply because this is the canonical representation of the data in our database.