Materials Project Database get deprecated structures

Hey,
I am training a machine learning model on the MaterialsProject database and it seems, that the performance of my model has decreased. I order to check if I messed up my model, I am trying to download the old data with the deprecated datapoints. Is this still possible? I tried to query the structures like this:

with MPRester(self.apikey) as m:
    query = m.query(
        criteria={"deprecated": True},
        properties=["deprecated", ],
    )
    for q in query:
        if q["deprecated"]:
            print("deprecated!")

but it seems like there are no deprecated structures in the data, because nothing is printed.

Hi @Stefaan,

It should be possible, however I see the same issue. It may be that the deprecated flag is not whitelisted and so is ignored when you perform the query.

If you’re using a newer version of pymatgen you can also check your ~/.pmgrc.yaml log to see what database version you were using more often, perhaps the database version changed after you trained your model.

If this is causing a specific problem for your work, if you let me know the exact database version and fields you need, I can send you a database dump from an older version.

Note that in our upcoming API you will be able to query for data from a given database version to avoid this issue, but this is not yet publicly available. We’re working hard however :slight_smile:

Hope this helps,

Matt

I’ve confirmed the deprecated key was not whitelisted, this has been fixed and will be live in our next release (~weeks).

Hi @mkhorton ,

thanks a lot for your reply and for your nice work!
I was trying to reproduce our own benchmark from a paper released in 2018. Do you think it is still possible to get database with Version 2.0.0 (Release Date: 04/13/2016). I think this must have been the data, that has been used for our benchmark model. I would need the structures and the “formation energy per atom”.

Best regards,
Stefaan

That version from 2016 is pretty far back for us, I’m not sure I have it to hand. All the tasks from back then should still be available via the API if you wanted to reconstruct the formation energies but it’d be a bit of an effort. I’ll check however.

We have a new release coming soon with an updated compatibility scheme, meaning better formation energies, so you may prefer to wait for that and re-train with that data.

There is also a data dump from some collaborators that’s available from 2018 Graphs of materials project where they archived data for training of their own ML models. I’d definitely encourage creating an archive of training data prior to publishing a model because the up-front values reported by the Materials Project do change as new calculations come in, even if data for individual calculations remains available it can be an effort to re-construct derived information like formation energies.

Best,

Matt

Hi @mkhorton ,

thanks for your reply! I started a training session based on the 2018 dataset and this seems to reproduce the benchmark pretty well. So there is no need for the old 2016 data. Thanks a lot for your help!

Best, Stefaan

Glad to hear it! Our new release went live yesterday too, so you may find our new data better as well. On the imminent horizon will also be the release of our new formation energy scheme which will hopefully make our predictions that much closer to experiment too.

Hi Matt,

I am currently running a high-throughput project of ternary hydrides and also encountered the issue of some structures being deprecated in the newest database (pymatgen 2022.0.16, db 2020_09_08). I have tried to query these structures by setting "deprecated": True in criteria when using the query function (just like how Stefaan did in the initial code) but the results seem to be only showing the existing structures, i.e. "deprecated": False. Below is my testing code:

from pymatgen.ext.matproj import MPRester

excluded_atoms_in_cation=["C", "N", "O","F", "Cl", "Br", "I","He", "Ne", "Ar", "Kr", "Xe","Pm", "Ac","Th","Pa","U","Np","Pu"]

criteria = {"elements": {"$all": ["H"], "$nin": excluded_atoms_in_cation},
                "nelements": 3, "deprecated": True}
with MPRester() as mpr:
    entries = mpr.query(criteria=criteria,
                        properties=["material_id", "pretty_formula", "formation_energy_per_atom","deprecated"])
print(len(entries))

The number of entries won’t change whether the “deprecated” tag is set or not.

I am quite confident there are deprecated structures that meet these searching criteria, etc. Li2MgH4, mp-1104241, hence would like to get some help regarding how I can properly access this info.

Best Regards,
Jiaxin

It seems like this situation has changed in the meantime? According to the docs

… in the interests of transparency all old calculations remain accessible via our programmatic API or via direct access on the website.

but actually I am unable to get data on deprecated IDs:

MPRester().query(
    {"task_ids": "mp-1206699"},
    ["material_id", "deprecated", "task_ids", "formula_pretty", "e_above_hull"],
)

[{'material_id': 'mp-1206699',
  'deprecated': True,
  'task_ids': ['mp-1206699'],
  'formula_pretty': None,
  'e_above_hull': None}]

Same thing on https://materialsproject.org. Direct links to materials just show this:

This material has been deprecated.

You can obtain its previously shown data via our API by direct ID reference.
Its data is no longer considered in searches or aggregations.

I will put a note in here that deprecated data retrieval is working with the new API client. Just add deprecated=True as an input parameter to the query method.

from mp_api import MPRester

# Retrieve data for all deprecated materials
with MPRester() as mpr:
    docs = mpr.query(deprecated=True, fields=["material_id", 
                                              "deprecated", 
                                              "task_ids", 
                                              "formula_pretty", 
                                              "energy_above_hull"])

Additionally, updated data on the new website/API shows mp-1206699 as now undeprecated (Materials Project - Materials Explorer - mp-1206699).

– Jason

@munrojm Haven’t converted any code to the new API yet. Is it stable now? Do I understand correctly, there’s no known way to get deprecated materials from the old API?

Hi @janosh, sorry for the late reply. Things are fairly stable with the new API client. However, I would maybe hold off a little bit longer until it is closer to public release if your stuff is mission critical. You should still be able to use the existing API if you have the task_id for the deprecated calculation.

– Jason