Material id mp-1249732
returns nothing in the MP web site, is not in the list of all entries I downloaded with mpr.materials.thermo.search()
, but several of the task ids associated with it (including the material id itself)
mp-1249732
mp-1263333
mp-1872191
are in the list of valid entries I downloaded with pd.read_parquet(f"s3://materialsproject-build/collections/{DB_VERSION}/task-validation/manifest.parquet")
filtering for .valid == True
, and DB_VERSION = '2024-11-14'
Why would a valid task_id have a material id that’s not in the entry database?
In a related note, what would be the reasons this material id might have no entry to begin with (independent of the task validity)? I know all the old Yb results were removed due to the PAW, but that’s definitely not the issue here (Al-Cu-Sb-Si-O)
The ID mp-1249732
is a non-deprecated task ID, and its parent material ID mp-1044336
is in MP:
from mp_api.client import MPRester
with MPRester() as mpr:
mat_doc = mpr.materials.search(task_ids=['mp-1249732'])[0]
print(mat_doc.material_id)
>>> MPID(mp-1044336)
print(sorted(task_id for task_id in mat_doc.task_ids if task_id not in mat_doc.deprecated_tasks))
>>> [MPID(mp-1044336), MPID(mp-1249732), MPID(mp-1252077), MPID(mp-1252112), MPID(mp-1257436), MPID(mp-1259927), MPID(mp-1263333), MPID(mp-1353680), MPID(mp-1872191), MPID(mvc-9133)]
For building thermo docs, we often have multiple tasks with the same run type (e.g., PBE GGA static). Within a given run type, only one task is used, corresponding to the one with lowest energy per atom.
Thanks for clarifying. I guess the fundamental problem is that in mptrj, that task id (mp-1249732
) was labelled with the wrong (i.e. the same) material id. Yet another issue to watch out for in mptrj.
However, since I want to be able fix these issues myself, I now realize that I don’t know how to find out the correct material id for each task. I don’t see the parent material id you associated with it (mp-1044336
) anywhere in this task’s records in the downloaded all tasks directory. How do I determine the correct parent material id?
The set of material IDs is a subset of the task IDs, so given a generic MPID
, you can do this from the API:
with MPRester() as mpr:
mat_doc = mpr.materials.search(task_ids=[MPID])[0]
is_material = mat_doc.material_id == MPID
If is_material
is True
, then MPID
corresponds both to a task ID and the material ID. If it’s False
, then MPID
only corresponds to a task ID.
Building off of that, you can get a mapping of all non-deprecated task IDs to their parent material ID:
mapping = {task_id: mat_doc.material_id for task_id in mat_doc.task_ids if task_id not in mat_doc.deprecated_tasks}
Thanks. To avoid many further queries on your server, can I bulk download this for all materials (ideally restricted to just the task_ids
and deprecated_tasks
lists, I guess)? If so, what do I pass mpr.materials.search(...)
?
mpr.materials.search()
will retrieve all of the documents in an efficient way, thanks for checking!
Thanks. I’ll try to do it a minimal number of times and cache the results.
I suggest just running once to pull all documents and then saving them in blocks if need be. For example, if you only need the task ID to material ID mapping:
from monty.serialization import dumpfn
with MPRester() as mpr:
mat_docs = mpr.materials.search(fields=["material_id","task_ids","deprecated_tasks"])
material_to_tasks = {
mat_doc.material_id.string : [task_id.string for task_id in mat_doc.task_ids if task_id not in mat_doc.deprecated_tasks]
for mat_doc in mat_docs
}
dumpfn(material_to_tasks,"material_id_to_task_ids.json.gz")
which will save it to a gzipped json.
Thanks - is there any prefab way to dump the entire mat_doc
, e.g. convert it to a nested dict so I can just use json?
Yes, using dumpfn
as in the code snippet in my previous message:
from monty.serialization import dumpfn
dumpfn(mat_docs,<file name goes here>.json)
@noam.bernstein In addition to @Aaron_Kaplan’s solutions, you can aslo directly download the underlying .jsonl.gz
files from our AWS OpenData repo using the AWS CLI. See AWS OpenData | Materials Project Documentation
1 Like