Mismatch between structure in materials document vs. structure in task document in MP API

Hello all,

When looking at material, for example, mp-753537, and getting the material doc for it:
mat_doc = mpr.materials.search('mp-753537')[0]

I see that it has the following task_ID: mp-1363005. I also see that the mat_doc has a structure with 14 sites:
len(mat_doc.structure)

When looking at the task_doc for ID mp-1363005
task_doc = mpr.materials.tasks.search('mp-1363005')[0]
I see that it has a structure with 28 sites:
len(task_doc.output.structure)

Could you explain the discrepancy? Is it because at the materials level the structure is made as the conventional standard structure? If so, does it use the Spacegroup Analyzer from PMG to do this? And why was the task using a slightly larger structure?

Thank you!

Hi @Martin_Siron1, there is a many-to-one correspondence from tasks to materials. Thus a task with the same MP ID as a material doesn’t necessarily correspond to all properties in a material.

You can see by looking in the origins field of a SummaryDoc which tasks are used to build some of the properties in a material:

from mp_api.client import MPRester

with MPRester() as mpr:
    summary_doc = mpr.materials.summary.search(material_ids = ["mp-753537"])[0]
    print(summary_doc.origins)
    for property in summary_doc.origins:
        if property.name == "structure":
            break
    task = mpr.materials.tasks.search(task_ids = [property.task_id])[0]
print(task.structure.num_sites)

The first print should show you something like this:

[PropertyOrigin(name='structure', task_id=MPID(mp-1319701), last_updated=datetime.datetime(2020, 4, 29, 23, 3, 15, 704000)), PropertyOrigin(name='energy', task_id=MPID(mp-1319701), last_updated=datetime.datetime(2023, 8, 3, 20, 42, 6, 351000)), PropertyOrigin(name='magnetism', task_id=MPID(mp-1319701), last_updated=datetime.datetime(2020, 4, 29, 23, 3, 15, 704000))]

and the second print should show you that the task corresponding to the structure in material mp-753537 has 14 sites

Thanks @Aaron_Kaplan super helpful!

But this leads to another question, here is sample code:

summary_doc = mpr.materials.summary.search(material_ids = ["mp-23835"])[0]
print("band_gap in summary doc", summary_doc.band_gap)
for task_id in summary_doc.task_ids:
    task_doc = mpr.materials.tasks.search(task_id)[0]
    print(task_id,"band_gap:",task_doc.output.bandgap)

What I see is at the summary doc level, the bandgap is 2.7234. But for each task that belongs to this material, the bandgaps are:

  • mp-1890103 band_gap: 2.7689
  • mp-740464 band_gap: 2.7644
  • mp-23835 band_gap: 2.9080000000000004
  • mp-727196 band_gap: 2.9078999999999997
  • mp-860675 band_gap: 2.7644
  • mp-860081 band_gap: 2.7644

None of these match the bandgap that are at the document level. And there is no origins field for electronic properties, only energy, structure and magnetism. Where does the bandgap at the material level then come from?

In this case, the band gap is coming from the DOS of task mp-860081. That task is deprecated however (looking at the materials endpoint) because it uses too few k-points.

Tagging at @tsmathis since we were looking at cases of this during our recent rebuild.

This should be another issue fixed in the coming build (certain materials have band gaps built by deprecated tasks)

Code reference:

from mp_api.client import MPRester

with MPRester() as mpr:
    mat_doc = mpr.materials.search(material_ids='mp-23835')[0]
    summ_doc = mpr.materials.summary.search(material_ids='mp-23835')[0]
    es_doc = mpr.materials.electronic_structure.search(material_ids=['mp-23835'])
print(es_doc[0].band_gap == summ_doc.band_gap, es_doc[0].task_id in mat_doc.deprecated_tasks)

should print True and True.