Inconsistency between Materials Project Database bandgap and pymatgen calculated bandgap

Hi there,

I’ve noticed a significant discrepancy between the bandgap values obtained from the Materials Project Database and those calculated by pymatgen using the same bandstructure data.

For example, for material mp-753512:

from mp_api.client import MPRester

mp_id = "mp-753512"
with MPRester() as mpr:
    # 获取能带结构
    bandstructure = mpr.get_bandstructure_by_material_id(mp_id)
    
    # 使用新的API方法获取材料信息
    material = mpr.materials.summary.search(material_ids=[mp_id])[0]
    
    # 获取详细的能带信息
    band_gap_info = bandstructure.get_band_gap()
    cbm_info = bandstructure.get_cbm()
    vbm_info = bandstructure.get_vbm()
    
    # 打印完整结果
    print(f"\nBandgap Analysis for {mp_id}:")
    print("-" * 50)
    print(f"MP Database Bandgap: {material.band_gap:.3f} eV")
    print(f"Pymatgen Calculated Bandgap: {band_gap_info['energy']:.3f} eV")
    print(f"Is Metal: {bandstructure.is_metal()}")
    
    print("\nBand Gap Details:")
    print(f"Direct Gap: {band_gap_info['direct']}")
    if 'transition' in band_gap_info:
        print(f"Transition: {band_gap_info['transition']}")
    
    print("\nBand Edge Information:")
    print("CBM:")
    print(f"  Energy: {cbm_info['energy']:.3f} eV")
    print(f"  k-point: {cbm_info['kpoint']} {cbm_info.get('cart_coords', '')} {cbm_info.get('label', '')}")
    
    print("VBM:")
    print(f"  Energy: {vbm_info['energy']:.3f} eV")
    print(f"  k-point: {vbm_info['kpoint']} {vbm_info.get('cart_coords', '')} {vbm_info.get('label', '')}")

The results show:

Bandgap Analysis for mp-753512:
--------------------------------------------------
MP Database Bandgap: 0.000 eV
Pymatgen Calculated Bandgap: 0.559 eV
Is Metal: False

Band Gap Details:
Direct Gap: False
Transition: D-E

Band Edge Information:
CBM:
  Energy: 4.872 eV
  k-point: [0.5 0.5 0.5] [1.00573415 0.66644364 0.2209705 ] E  
VBM:
  Energy: 4.314 eV
  k-point: [0.5 0.  0.5] [1.00573415 0.         0.22151277] D  

Questions:

  1. Why is there such a large difference between the database value and the calculated value?
  2. If the material has a non-zero bandgap and is classified as non-metallic by pymatgen, why does the MP database show it as having zero bandgap?
  3. Is there a specific threshold below which bandgaps are considered as zero in the MP database?

Any insights into this discrepancy would be greatly appreciated. This understanding would help users better interpret the bandgap data from both sources.

See here for the initial posting of this problem.

Regards,
Zhao

This has to do with a bug we’re aware, and should be fixed in a forthcoming data release.

Some tasks (=single DFT calculations) are labelled with the wrong task type (tells you about the kind of calculation performed). In this case, a line-mode scan of the band structure is mislabeled and the corresponding band structure doesn’t get pulled into the summary document.

You can see all the tasks that build a materials document like this:

mat_doc = mpr.materials.search(material_ids=['mp-753512'])[0]
tasks = mpr.materials.tasks.search(task_ids=[task_id for task_id in mat_doc.task_ids if task_id not in mat_doc.deprecated_tasks])
for task in tasks:
  print(task.task_id, task.task_type, task.output.bandgap)

should print something like this (the order doesn’t matter):

mp-809180 NSCF Uniform 0.5586000000000002
mp-1785566 Static 0.6380999999999997
mp-1333251 Static 0.0
mp-801798 Static 0.7089999999999996
mp-765868 Structure Optimization 0.7275999999999998
mp-1685581 NSCF Uniform 0.0

Clearly the first task, mp-809180, corresponds to the band structure you obtained with get_bandstructure_by_material_id, and checking by hand, it is a line-mode band structure calculation. However it is mislabeled as uniform and does not correctly populate the band_gap field in the summary doc.

2 Likes

However, using the same code snippet, I obtained different results:

from mp_api.client import MPRester

with MPRester() as mpr:
    mat_doc = mpr.materials.search(material_ids=['mp-753512'])[0]
    tasks = mpr.materials.tasks.search(task_ids=[task_id for task_id in mat_doc.task_ids if task_id not in mat_doc.deprecated_tasks])
    for task in tasks:
        print(task.task_id, task.task_type, task.output.bandgap)

The results:

mp-1785566 Static 0.638099999999999
mp-1333251 Static 0.0
mp-801798 Static 0.7089999999999991
mp-1685581 NSCF Uniform 0.0
mp-808519 NSCF Line 0.5808999999999991
mp-765868 Structure Optimization 0.727599999999999
mp-809180 NSCF Line 0.5586000000000001

The detailed differences between our results are shown below:

# This is your results:
werner@x13dai-t:~$ cat yours 
mp-809180 NSCF Uniform 0.5586000000000002
mp-1785566 Static 0.6380999999999997
mp-1333251 Static 0.0
mp-801798 Static 0.7089999999999996
mp-765868 Structure Optimization 0.7275999999999998
mp-1685581 NSCF Uniform 0.0

# This is my results:
werner@x13dai-t:~$ cat mine 
mp-1785566 Static 0.638099999999999
mp-1333251 Static 0.0
mp-801798 Static 0.7089999999999991
mp-1685581 NSCF Uniform 0.0
mp-808519 NSCF Line 0.5808999999999991
mp-765868 Structure Optimization 0.727599999999999
mp-809180 NSCF Line 0.5586000000000001

The diff:

We just released the new database where those task type corrections were implemented (Changelog). That’s why you’re seeing NSCF Line whereas a few days ago, the task type was NSCF Uniform (when I ran it). The floats are identical up to their numeric representation.

However, so far, the issues initially reported in this discussion still exist:

Bandgap Analysis for mp-753512:
--------------------------------------------------
MP Database Bandgap: 0.000 eV
Pymatgen Calculated Bandgap: 0.559 eV
Is Metal: False

Band Gap Details:
Direct Gap: False
Transition: D-E

Band Edge Information:
CBM:
  Energy: 4.872 eV
  k-point: [0.5 0.5 0.5] [1.00573415 0.66644364 0.2209705 ] E  
VBM:
  Energy: 4.314 eV
  k-point: [0.5 0.  0.5] [1.00573415 0.         0.22151277] D  

I have also conducted the following checks for more details about these tasks

from mp_api.client import MPRester

with MPRester() as mpr:
    # 获取材料文档
    mat_doc = mpr.materials.search(material_ids=['mp-753512'])[0]
    
    # 获取非弃用的任务
    tasks = mpr.materials.tasks.search(task_ids=[
        task_id for task_id in mat_doc.task_ids 
        if task_id not in mat_doc.deprecated_tasks
    ])
    
    # 打印每个任务的信息和带隙
    for task in tasks:
        print(f"Task ID: {task.task_id}")
        print(f"Task Type: {task.task_type}")
        print(f"Bandgap: {task.output.bandgap}")
        print(f"Last Updated: {task.last_updated}")  # 添加最后更新时间
        print("-" * 50)

The results:

Task ID: mp-1785566
Task Type: Static
Bandgap: 0.638099999999999
Last Updated: 2020-11-12 03:39:23.679000
--------------------------------------------------
Task ID: mp-1333251
Task Type: Static
Bandgap: 0.0
Last Updated: 2020-04-29 21:53:32.752000
--------------------------------------------------
Task ID: mp-808519
Task Type: NSCF Line
Bandgap: 0.5808999999999991
Last Updated: 2014-04-30 04:25:28
--------------------------------------------------
Task ID: mp-1685581
Task Type: NSCF Uniform
Bandgap: 0.0
Last Updated: 2020-07-15 19:57:16.745000
--------------------------------------------------
Task ID: mp-801798
Task Type: Static
Bandgap: 0.7089999999999991
Last Updated: 2014-04-18 14:54:32
--------------------------------------------------
Task ID: mp-809180
Task Type: NSCF Line
Bandgap: 0.5586000000000001
Last Updated: 2021-02-24 11:29:10.998000
--------------------------------------------------
Task ID: mp-765868
Task Type: Structure Optimization
Bandgap: 0.727599999999999
Last Updated: 2014-02-15 14:44:42
--------------------------------------------------

OK I’m seeing what else might be going on here. For this particular material, the NSCF Uniform calculation, which is used when you call MPRester().get_dos_by_material_id shows no / very small band gap:

dos = mpr.get_dos_by_material_id('mp-753512')
dos.get_gap()
>>> 0.0
dos.get_cbm_vbm()
>>> (5.123, 5.1559)

The other piece to note is that the magnetic configurations of the structures in the DOS calculation differ from that of the total energy calculation - there’s a warning printed in the Electronic Structure tab of the website but that should also get added to the API

Will need to look into this more and see if it’s an issue with the calculations