MP API vs. REST API discrepancy

Martin_Siron1 · November 15, 2024, 8:02am

Hi all,

I have some questions regarding the MP API:

When using the Python API such as:

task = mpr.materials.tasks.search(task_ids=["mp-1943879”])[0]
print(task.entry.parameters['run_type’])

I can see that this task was using R2SCAN

But when I output task.input.parameters I cannot see anything that would indicate it is indeed an R2SCAN calculation:

There is no METAGGA tags and the GGA tag is empty (‘---‘)

Further, when I use the MP REST API for this same task for the entry endpoint:

https://api.materialsproject.org/materials/tasks/entries/?task_ids=mp-1943879&_per_page=100&_skip=0&_limit=100

I am not able to retrieve the run_type at all, it is listed as “null” further the POTCAR spec field is also empty which is not true for when using the MP API in Python.

And when I look at the overall task endpoint:

https://api.materialsproject.org/materials/tasks/?task_ids=mp-1943879&_per_page=100&_skip=0&_limit=100&_all_fields=true

Here orig_inputs.incar do show that this is an R2SCAN calculation (METAGGA=“R2SCAN”). However inputs.parameters do not show its a R2SCAN calc.

Couple questions:

What is the reasoning for the potential mismatch?
What is the difference between an entry and a task? Is an entry a processed ComputedStructureEntry in PMG that belongs to a task?
Is it 1:1, if so, why is entry an array for tasks?
If it’s not, where do forces for example at the task level come from? Is only one entry chosen for this?
In my mind I understand one task to be a single VASP calculation, and that a material could have many tasks for each calculation type, etc?
Why might there be for one material endpoint, for example, 6 different GGA Structure Optimization tasks?
Are 5 of these deprecated always?

Thanks so much for all your time and help!

Aaron_Kaplan · November 15, 2024, 11:23pm

Hi @Martin_Siron1,

The issue you noted with task.input.parameters is a VASP quirk: INCAR tags like METAGGA aren’t actually written to the vasprun.xml, which is used to parse both the parameters field and then to determine task, run, and calc_type. See this related GH discussion

If you pip install --upgrade emmet-core, that should resolve this issue and correctly identify r²SCAN tasks as such (just double checked with emmet-core==0.84.2).

To answer some of your other questions (ping me if I miss anything/am unclear):

A task represents one DFT calculation and contains all input / output parsed from the calculation. The tasks in MP build up all other processed/analyzed (“built”) data
Larger output data (CHGCAR, AECCAR*, LOCPOT) are served separately from tasks for data management reasons
An entry (like ComputedStructureEntry) has only essential information from a task to later analyze data, like for thermo docs
Properties like energies, forces, and stresses come directly from a single DFT calculation / single task
Each material can have multiple tasks building it up. These could be calcs like an r2SCAN geometry optimization, or a separate band structure calculation. You can see the mapping between a task and a material in the origins field of a SummaryDoc
Each material has a different number of deprecated tasks for any of various reasons (wrong / insufficient calculation parameters, wrong structure, etc.)

Martin_Siron1 · November 18, 2024, 12:44pm

Thanks @Aaron_Kaplan @tschaume , this is super helpful!!

On a related question, a material might have 2+ tasks, for example a GGA Structure Optimization and a GGA NSCF Uniform. Both tasks have the bandgap parsed. Which bandgap is then displayed on MP? If a material does not have a NSCF Uniform task, which bandgap is parsed then?

As a follow up, why are there 153k materials, but when I count all of the non-deprecated GGA tasks I get:

“GGA Structure Optimization” - count: 170250
“GGA+U Structure Optimization” - count: 95360

Might a material have multiple GGA Structure Optimization task, and would a material have a GGA structure optimization tasks and GGA+U Structure Optimization task if it’s in the GGA+U category (oxide, fluoride + some transition metals)? Or should it only have GGA+U and no GGA?

Similarly, I see 63k band structure calculations but the following number of tasks for these:

“GGA NSCF Uniform” - count: 118769
“GGA+U NSCF Uniform” - count: 27090

And why might there be both tasks with a calc_type of NSCF Uniform and a run_type of either GGA Static or GGA NSCF Uniform? What is the difference between a NSCF Uniform with GGA Static run_type vs. NSCF Uniform run_type?

Could you explain why there might be this discrepancy?

Thanks again!

Martin_Siron1 · November 18, 2024, 4:31pm

To further expand, for some material, ie mp-33302, (API-link)

I see the following task_ids and calc_types:

{
  "task_ids": [
    "mp-531644",
    "mp-530434",
    "mp-531759",
    "mp-531304",
    "mp-744290",
    "mp-531411",
    "mp-530836",
    "mp-530534",
    "mp-530099",
    "mp-532435",
    "mp-530487",
    "mp-531049",
    "mp-532002",
    "mp-531642",
    "mp-530366",
    "mp-531743",
    "mp-33302",
    "mp-531460",
    "mp-530342",
    "mp-530548",
    "mp-532239",
    "mp-531457",
    "mp-530496",
    "mp-531211",
    "mp-530741",
    "mp-530904",
    "mp-531537",
    "mp-721783",
    "mp-531754",
    "mp-530283",
    "mp-531452",
    "mp-1353560",
    "mp-531468",
    "mp-532442",
    "mp-531631",
    "mp-531070",
    "mp-531477",
    "mp-531144",
    "mp-532206",
    "mp-531250",
    "mp-530122",
    "mp-530201",
    "mp-33323",
    "mp-734480",
    "mp-530906",
    "mp-530581",
    "mp-530755",
    "mp-530440",
    "mp-532503",
    "mp-530664",
    "mp-532493",
    "mp-531300",
    "mp-531625",
    "mp-531571",
    "mp-531221",
    "mp-531665",
    "mp-530837",
    "mp-532184",
    "mp-530414",
    "mp-530491",
    "mp-532242",
    "mp-530442",
    "mp-531851",
    "mp-530530",
    "mp-531874",
    "mp-530368",
    "mp-530532",
    "mp-531404",
    "mp-530225",
    "mp-532114",
    "mp-530260",
    "mp-531233",
    "mp-530796",
    "mp-530718",
    "mp-532450",
    "mp-531418",
    "mp-532354",
    "mp-532320",
    "mp-532429",
    "mp-532374",
    "mp-531982",
    "mp-532418",
    "mp-531407",
    "mp-531188",
    "mp-530131",
    "mp-530477",
    "mp-531086",
    "mp-531433",
    "mp-531885",
    "mp-532192",
    "mp-531258",
    "mp-531142",
    "mp-530657",
    "mp-530675",
    "mp-530604",
    "mp-530183",
    "mp-532018",
    "mp-532272",
    "mp-532011",
    "mp-531967",
    "mp-531621",
    "mp-532068",
    "mp-531007",
    "mp-530501",
    "mp-531904",
    "mp-532473",
    "mp-532145",
    "mp-531126",
    "mp-532314",
    "mp-532474",
    "mp-530500",
    "mp-530802",
    "mp-532174",
    "mp-530731",
    "mp-531732",
    "mp-531898",
    "mp-531423",
    "mp-532275",
    "mp-532111",
    "mp-532494",
    "mp-532303",
    "mp-530660",
    "mp-530903",
    "mp-530419",
    "mp-531217",
    "mp-531977",
    "mp-532125",
    "mp-531101",
    "mp-530699",
    "mp-531839",
    "mp-532048",
    "mp-530705",
    "mp-531906",
    "mp-532389",
    "mp-531546",
    "mp-531122",
    "mp-530481",
    "mp-531845",
    "mp-1938803",
    "mp-531630",
    "mp-531047",
    "mp-530443"
  ],
  "calc_types": {
    "mp-721783": "GGA Static",
    "mp-734480": "GGA NSCF Uniform",
    "mp-530131": "GGA Structure Optimization",
    "mp-531304": "GGA Structure Optimization",
    "mp-530368": "GGA Structure Optimization",
    "mp-531904": "GGA Structure Optimization",
    "mp-532314": "GGA Structure Optimization",
    "mp-532493": "GGA Structure Optimization",
    "mp-530660": "GGA Structure Optimization",
    "mp-531221": "GGA Structure Optimization",
    "mp-532494": "GGA Structure Optimization",
    "mp-531839": "GGA Structure Optimization",
    "mp-531967": "GGA Structure Optimization",
    "mp-531101": "GGA Structure Optimization",
    "mp-532206": "GGA Structure Optimization",
    "mp-531404": "GGA Structure Optimization",
    "mp-532242": "GGA Structure Optimization",
    "mp-530906": "GGA Structure Optimization",
    "mp-532192": "GGA Structure Optimization",
    "mp-531144": "GGA Structure Optimization",
    "mp-532503": "GGA Structure Optimization",
    "mp-530664": "GGA Structure Optimization",
    "mp-530530": "GGA Structure Optimization",
    "mp-531845": "GGA Structure Optimization",
    "mp-531546": "GGA Structure Optimization",
    "mp-530731": "GGA Structure Optimization",
    "mp-530903": "GGA Structure Optimization",
    "mp-530491": "GGA Structure Optimization",
    "mp-530581": "GGA Structure Optimization",
    "mp-530755": "GGA Structure Optimization",
    "mp-531007": "GGA Structure Optimization",
    "mp-531851": "GGA Structure Optimization",
    "mp-532320": "GGA Structure Optimization",
    "mp-531754": "GGA Structure Optimization",
    "mp-530802": "GGA Structure Optimization",
    "mp-530836": "GGA Structure Optimization",
    "mp-532239": "GGA Structure Optimization",
    "mp-530283": "GGA Structure Optimization",
    "mp-530705": "GGA Structure Optimization",
    "mp-531874": "GGA Structure Optimization",
    "mp-530741": "GGA Structure Optimization",
    "mp-532374": "GGA Structure Optimization",
    "mp-532473": "GGA Structure Optimization",
    "mp-530604": "GGA Structure Optimization",
    "mp-530342": "GGA Structure Optimization",
    "mp-532011": "GGA Structure Optimization",
    "mp-530183": "GGA Structure Optimization",
    "mp-531407": "GGA Structure Optimization",
    "mp-530548": "GGA Structure Optimization",
    "mp-530201": "GGA Structure Optimization",
    "mp-530904": "GGA Structure Optimization",
    "mp-531743": "GGA Structure Optimization",
    "mp-532474": "GGA Structure Optimization",
    "mp-531977": "GGA Structure Optimization",
    "mp-530225": "GGA Structure Optimization",
    "mp-530477": "GGA Structure Optimization",
    "mp-531621": "GGA Structure Optimization",
    "mp-531630": "GGA Structure Optimization",
    "mp-532145": "GGA Structure Optimization",
    "mp-531885": "GGA Structure Optimization",
    "mp-530675": "GGA Structure Optimization",
    "mp-530481": "GGA Structure Optimization",
    "mp-532275": "GGA Structure Optimization",
    "mp-532174": "GGA Structure Optimization",
    "mp-532002": "GGA Structure Optimization",
    "mp-531049": "GGA Structure Optimization",
    "mp-530442": "GGA Structure Optimization",
    "mp-532272": "GGA Structure Optimization",
    "mp-531732": "GGA Structure Optimization",
    "mp-531457": "GGA Structure Optimization",
    "mp-530443": "GGA Structure Optimization",
    "mp-530657": "GGA Structure Optimization",
    "mp-532068": "GGA Structure Optimization",
    "mp-531452": "GGA Structure Optimization",
    "mp-531537": "GGA Structure Optimization",
    "mp-531898": "GGA Structure Optimization",
    "mp-530496": "GGA Structure Optimization",
    "mp-530414": "GGA Structure Optimization",
    "mp-531468": "GGA Structure Optimization",
    "mp-531423": "GGA Structure Optimization",
    "mp-530532": "GGA Structure Optimization",
    "mp-531418": "GGA Structure Optimization",
    "mp-531411": "GGA Structure Optimization",
    "mp-530260": "GGA Structure Optimization",
    "mp-531631": "GGA Structure Optimization",
    "mp-531906": "GGA Structure Optimization",
    "mp-531086": "GGA Structure Optimization",
    "mp-531642": "GGA Structure Optimization",
    "mp-530718": "GGA Structure Optimization",
    "mp-530419": "GGA Structure Optimization",
    "mp-33302": "GGA Structure Optimization",
    "mp-531644": "GGA Structure Optimization",
    "mp-531982": "GGA Structure Optimization",
    "mp-531047": "GGA Structure Optimization",
    "mp-532184": "GGA Structure Optimization",
    "mp-531233": "GGA Structure Optimization",
    "mp-532389": "GGA Structure Optimization",
    "mp-531571": "GGA Structure Optimization",
    "mp-530366": "GGA Structure Optimization",
    "mp-531142": "GGA Structure Optimization",
    "mp-532114": "GGA Structure Optimization",
    "mp-532435": "GGA Structure Optimization",
    "mp-530500": "GGA Structure Optimization",
    "mp-532450": "GGA Structure Optimization",
    "mp-530501": "GGA Structure Optimization",
    "mp-532442": "GGA Structure Optimization",
    "mp-531433": "GGA Structure Optimization",
    "mp-532354": "GGA Structure Optimization",
    "mp-531460": "GGA Structure Optimization",
    "mp-530837": "GGA Structure Optimization",
    "mp-531759": "GGA Structure Optimization",
    "mp-531300": "GGA Structure Optimization",
    "mp-532429": "GGA Structure Optimization",
    "mp-531122": "GGA Structure Optimization",
    "mp-531126": "GGA Structure Optimization",
    "mp-530099": "GGA Structure Optimization",
    "mp-33323": "GGA Structure Optimization",
    "mp-532418": "GGA Structure Optimization",
    "mp-531477": "GGA Structure Optimization",
    "mp-530434": "GGA Structure Optimization",
    "mp-532125": "GGA Structure Optimization",
    "mp-530699": "GGA Structure Optimization",
    "mp-530440": "GGA Structure Optimization",
    "mp-531217": "GGA Structure Optimization",
    "mp-532048": "GGA Structure Optimization",
    "mp-531188": "GGA Structure Optimization",
    "mp-530534": "GGA Structure Optimization",
    "mp-530487": "GGA Structure Optimization",
    "mp-531665": "GGA Structure Optimization",
    "mp-531625": "GGA Structure Optimization",
    "mp-530796": "GGA Structure Optimization",
    "mp-531258": "GGA Structure Optimization",
    "mp-531211": "GGA Structure Optimization",
    "mp-531070": "GGA Structure Optimization",
    "mp-532303": "GGA Structure Optimization",
    "mp-532018": "GGA Structure Optimization",
    "mp-531250": "GGA Structure Optimization",
    "mp-532111": "GGA Structure Optimization",
    "mp-530122": "GGA Structure Optimization",
    "mp-1353560": "GGA Static",
    "mp-1938803": "GGA Static",
    "mp-744290": "GGA NSCF Uniform"
  },
}

How do I know which GGA Structure Optimization was used? Why does this material have >100 tasks each for Structure Optimization? How do I know which GGA Static was used for energy?

Martin_Siron1 · November 18, 2024, 5:55pm

And to add a bit more additional details, if I only look at one material, say mp-553919, and I decide to take the last tasks NSCF Uniform and last task for Structure Optimization:

I see that the last GGA+U NSCF Uniform for this material was done in 2014, but in 2022 another GGA+U Structure Relaxation was performed. Should I assume that the relaxation from that changed little enough that the 2014 GGA+U NSCF Uniform should still be more or less the same had it followed the atom positions from the 2022 Structure Optimization?

In the Database Changelog I don’t see something that would indicate why this material underwent another Structure Optimization in 2022.

Aaron_Kaplan · November 21, 2024, 5:55pm

We periodically redo calculations, usually in the interest of accuracy (updating the computational parameters used in a calc). For most materials, you’ll see multiple structure optimizations listed in the task_ids field of a materials document (MPDataDoc).

Some of those tasks may be deprecated, meaning that they are not used to build a property in a material. You can see which ones are deprecated in the deprecated_tasks attribute of an MPDataDoc

And why might there be both tasks with a calc_type of NSCF Uniform and a run_type of either GGA Static or GGA NSCF Uniform? What is the difference between a NSCF Uniform with GGA Static run_type vs. NSCF Uniform run_type?

There shouldn’t be a mismatch, calc_type is the union of task_type and run_type. We’re putting out a release soon that should correct some mislabeled task_types, I would check if upgrading your emmet-core to the newest version and/or the updated DB release fix these mismatched entries

How do I know which GGA Structure Optimization was used? Why does this material have >100 tasks each for Structure Optimization? How do I know which GGA Static was used for energy?

You can look in the origins field to know which was used for the structure and/or energy, see my answer to your other post

In general, it’s not possible to know why certain tasks were performed but for this material, looking at its provenance is helpful:

with MPRester() as mpr:
    provenance = mpr.materials.provenance.search(material_ids=["mp-33302"])[0]

Looking at provenance.remarks shows that this is an ordered representation of a configurationally disordered structure. The many tasks matched to this material are then probably different ordered representations from either SQS or enumeration, and only the lowest energy structure is used to build the material (corresponding to the ground state structure).

Should I assume that the relaxation from that changed little enough that the 2014 GGA+U NSCF Uniform should still be more or less the same had it followed the atom positions from the 2022 Structure Optimization?

You can always check if structures are similar using pymatgen’s StructureMatcher:

from pymatgen.analysis.structure_matcher import StructureMatcher, ElementComparator

matcher = StructureMatcher(comparator = ElementComparator())
matcher.fit(structure_1, structure_2)

where structure_1 and _2 are the structure you want to compare. If the matcher returns True, you can reasonably assume that the structures are almost-identical up to PBC.

Martin_Siron1 · November 22, 2024, 10:46am

Great, thanks a lot for all this, this really helps clear up the appearance of a mismatch!

And as for this:

The information I got was actually scrapped directly from the REST API so there isn’t another library to parse this extra information. Do you suggest only going by the MP API rather than the REST going forward, or will the DB be updated to match the MP-API data with REST API?

Aaron_Kaplan · November 25, 2024, 1:59pm

Depends on your use case, but for data retrieval, using the Materials Project API (mp_api) is probably optimal. mp_api retrieves the same data you’re retrieving. If you can send a few task IDs where you see inconsistencies between the task / run / calc type, that can help us validate our coming build/data release

tschaume · November 25, 2024, 5:51pm

In addition to @Aaron_Kaplan suggestions, we highly recommend using our python client mp-api as described in our docs to retrieve data from the MP API. The library is optimized for efficient usage of the shared resources running the API and database servers. There’s no guarantee that specific endpoints in the API will stay the same forever but we keep the python client updated to deal with those changes to avoid breaking changes for our users as much as possible. At this point, there aren’t any clients in other languages officially supported by MP.

Martin_Siron1 · November 26, 2024, 9:13am

Here is a short list of task_IDs where the calc_type is GGA Static for all, but the task_type is NSCF Uniform:

mp-999986
mp-999925
mp-999948
mp-999975
mp-1021827
mp-999977
mp-1021833
mp-999985
mp-999991
mp-1000352
mp-1011641
mp-1021826
mp-999926
mp-1000009
mp-1000013
mp-1000007
mp-968940
mp-999956
mp-1000356