Hey,
I am training a machine learning model on the MaterialsProject database and it seems, that the performance of my model has decreased. I order to check if I messed up my model, I am trying to download the old data with the deprecated datapoints. Is this still possible? I tried to query the structures like this:
with MPRester(self.apikey) as m:
query = m.query(
criteria={"deprecated": True},
properties=["deprecated", ],
)
for q in query:
if q["deprecated"]:
print("deprecated!")
but it seems like there are no deprecated structures in the data, because nothing is printed.
It should be possible, however I see the same issue. It may be that the deprecated flag is not whitelisted and so is ignored when you perform the query.
If you’re using a newer version of pymatgen you can also check your ~/.pmgrc.yaml log to see what database version you were using more often, perhaps the database version changed after you trained your model.
If this is causing a specific problem for your work, if you let me know the exact database version and fields you need, I can send you a database dump from an older version.
Note that in our upcoming API you will be able to query for data from a given database version to avoid this issue, but this is not yet publicly available. We’re working hard however
thanks a lot for your reply and for your nice work!
I was trying to reproduce our own benchmark from a paper released in 2018. Do you think it is still possible to get database with Version 2.0.0 (Release Date: 04/13/2016). I think this must have been the data, that has been used for our benchmark model. I would need the structures and the “formation energy per atom”.
That version from 2016 is pretty far back for us, I’m not sure I have it to hand. All the tasks from back then should still be available via the API if you wanted to reconstruct the formation energies but it’d be a bit of an effort. I’ll check however.
We have a new release coming soon with an updated compatibility scheme, meaning better formation energies, so you may prefer to wait for that and re-train with that data.
There is also a data dump from some collaborators that’s available from 2018 Graphs of materials project where they archived data for training of their own ML models. I’d definitely encourage creating an archive of training data prior to publishing a model because the up-front values reported by the Materials Project do change as new calculations come in, even if data for individual calculations remains available it can be an effort to re-construct derived information like formation energies.
thanks for your reply! I started a training session based on the 2018 dataset and this seems to reproduce the benchmark pretty well. So there is no need for the old 2016 data. Thanks a lot for your help!
Glad to hear it! Our new release went live yesterday too, so you may find our new data better as well. On the imminent horizon will also be the release of our new formation energy scheme which will hopefully make our predictions that much closer to experiment too.
I am currently running a high-throughput project of ternary hydrides and also encountered the issue of some structures being deprecated in the newest database (pymatgen 2022.0.16, db 2020_09_08). I have tried to query these structures by setting "deprecated": True in criteria when using the query function (just like how Stefaan did in the initial code) but the results seem to be only showing the existing structures, i.e. "deprecated": False. Below is my testing code:
The number of entries won’t change whether the “deprecated” tag is set or not.
I am quite confident there are deprecated structures that meet these searching criteria, etc. Li2MgH4, mp-1104241, hence would like to get some help regarding how I can properly access this info.
I will put a note in here that deprecated data retrieval is working with the new API client. Just add deprecated=True as an input parameter to the query method.
from mp_api import MPRester
# Retrieve data for all deprecated materials
with MPRester() as mpr:
docs = mpr.query(deprecated=True, fields=["material_id",
"deprecated",
"task_ids",
"formula_pretty",
"energy_above_hull"])
@munrojm Haven’t converted any code to the new API yet. Is it stable now? Do I understand correctly, there’s no known way to get deprecated materials from the old API?
Hi @janosh, sorry for the late reply. Things are fairly stable with the new API client. However, I would maybe hold off a little bit longer until it is closer to public release if your stuff is mission critical. You should still be able to use the existing API if you have the task_id for the deprecated calculation.