(Bug in the DB?) Band gap for mp-1211100 became 0...

Hi! I notice that band gap for mp-1211100 is currently shown to be 0 (both in web interface and through MP API). It used to be 6.1396eV as of early 2024, and it is still like that in the legacy web interface.

What is the reason for this change? Is this a bug? Has this material been re-calculated?

Thanks!

I see the following bit in the changelog for v2024.11.14:

21,144 tasks were incorrectly assigned a task_type of NSCF Uniform when they were really NSCF Line. NSCF Uniform tasks are used to calculate DOSes, NSCF Linetasks are used to generate band structure scans. These and associated properties in materials/summary (band gaps, DOS, etc.) have been corrected.

Could the bug potentially have happened during this update?

Hi @SiLiKhon, thanks for reporting this. It doesn’t look like the task type change should be the culprit here, since two different Static tasks have very different bandgaps (mp-1342265 has a 6.1404 eV bandgap, whereas mp-1776854 has a 0 eV gap). The Static task type is distinct from either NSCF Line or NSCF Uniform

We’ll have to look into this more, I’ll follow up with any updates!

Thanks @Aaron_Kaplan for your comment.

Meanwhile, I made a quick analysis of my query cache from early 2024 vs the results of same query as of today with mp-api==0.45.1, and I see a lot of differences.

The query is the following:

mpr.materials.summary.search(
    elements=["Li"],
    band_gap=(0.5, 100),
    energy_above_hull=(0.0, 0.05),
    fields=["material_id", "formula_pretty", "nelements", "elements",
            "nsites", "structure", "energy_above_hull", "volume", "band_gap"],
)
List of material IDs only in early 2024

mp-1029385, mp-1079418, mp-1079491, mp-1100430, mp-1105785, mp-1147598, mp-1173959, mp-1174205, mp-1174220, mp-1174262, mp-1174283, mp-1174285, mp-1174519, mp-1174737, mp-1176564, mp-1176626, mp-1176631, mp-1176632, mp-1176675, mp-1176702, mp-1176706, mp-1176726, mp-1177332, mp-1177369, mp-1177503, mp-1185301, mp-1192295, mp-1192707, mp-1194609, mp-1207120, mp-1208619, mp-1210869, mp-1210977, mp-1211100, mp-1218250, mp-1220728, mp-1220967, mp-1222358, mp-1222370, mp-1222452, mp-1222626, mp-1223097, mp-1223686, mp-1227900, mp-1228109, mp-1244614, mp-1247100, mp-1304584, mp-1638873, mp-1641248, mp-1666404, mp-16699, mp-18911, mp-19043, mp-23736, mp-25812, mp-25943, mp-26015, mp-26051, mp-26135, mp-26141, mp-26197, mp-26199, mp-26817, mp-29582, mp-31614, mp-555112, mp-556229, mp-558765, mp-560058, mp-606680, mp-699599, mp-704183, mp-705334, mp-705429, mp-752869, mp-753429, mp-754232, mp-754266, mp-755151, mp-755505, mp-755562, mp-756108, mp-756218, mp-756480, mp-756827, mp-757295, mp-758346, mp-758472, mp-758538, mp-759819, mp-760016, mp-760293, mp-760317, mp-766011, mp-766507, mp-766814, mp-766827, mp-767762, mp-768534, mp-768550, mp-768581, mp-768653, mp-768689, mp-770852, mp-771579, mp-772945, mp-774458, mp-775182, mp-775256, mp-777404, mp-777767, mp-778813, mp-778860, mp-779356, mp-780308, mp-780482, mp-780571, mp-8152, mp-8204, mp-849450, mp-850906, mp-861245, mp-9159, mp-942701, mp-989504, mp-989536, mp-996987, mp-998230

List of material IDs only in 2025.01.22 result

mp-1029294, mp-1029424, mp-1029517, mp-1032403, mp-1032414, mp-1034976, mp-1045540, mp-1174308, mp-1175374, mp-1176890, mp-1176904, mp-1177014, mp-1177991, mp-1196541, mp-1200046, mp-1247242, mp-1293168, mp-1304518, mp-1661789, mp-1666476, mp-19117, mp-1976674, mp-2033990, mp-2206511, mp-25397, mp-25963, mp-26105, mp-26683, mp-26846, mp-2750540, mp-2755986, mp-2761389, mp-2761607, mp-2763329, mp-2763586, mp-2763849, mp-2763899, mp-2764466, mp-2764589, mp-2766181, mp-2767670, mp-2897115, mp-2900776, mp-2911947, mp-2912291, mp-2912355, mp-504371, mp-505431, mp-532590, mp-532789, mp-540462, mp-554967, mp-555626, mp-559443, mp-561743, mp-584012, mp-585269, mp-6721, mp-672966, mp-695021, mp-696441, mp-697792, mp-723831, mp-752460, mp-752770, mp-752796, mp-752921, mp-752993, mp-753039, mp-753063, mp-753171, mp-753202, mp-753247, mp-753341, mp-753405, mp-753432, mp-753447, mp-753609, mp-753729, mp-753946, mp-754082, mp-754091, mp-754195, mp-754330, mp-754784, mp-754987, mp-755180, mp-755278, mp-755379, mp-755397, mp-755638, mp-755653, mp-755734, mp-756422, mp-756577, mp-756730, mp-756814, mp-757072, mp-757075, mp-757100, mp-757344, mp-757558, mp-757602, mp-757613, mp-757646, mp-757753, mp-757866, mp-757930, mp-758042, mp-758062, mp-758320, mp-758480, mp-758589, mp-758642, mp-758838, mp-758937, mp-759056, mp-759118, mp-759213, mp-759224, mp-759234, mp-759305, mp-759492, mp-759552, mp-759712, mp-759767, mp-759775, mp-759828, mp-759877, mp-759901, mp-760255, mp-760329, mp-760815, mp-760997, mp-761124, mp-761142, mp-761149, mp-761602, mp-761676, mp-761897, mp-762326, mp-763519, mp-763578, mp-764063, mp-764125, mp-764764, mp-764969, mp-764994, mp-765615, mp-765629, mp-766206, mp-766444, mp-766459, mp-766523, mp-767588, mp-768084, mp-768527, mp-768544, mp-768634, mp-768697, mp-768927, mp-768935, mp-768946, mp-769631, mp-769962, mp-770078, mp-770495, mp-770504, mp-770510, mp-770628, mp-770667, mp-770941, mp-770958, mp-771151, mp-771155, mp-772004, mp-772319, mp-772868, mp-773002, mp-773015, mp-773087, mp-773112, mp-773122, mp-773564, mp-774230, mp-774245, mp-774249, mp-774305, mp-774465, mp-774781, mp-774798, mp-774894, mp-774905, mp-775027, mp-775161, mp-775194, mp-775656, mp-775818, mp-775982, mp-776035, mp-776243, mp-776396, mp-776446, mp-776449, mp-776451, mp-776477, mp-776482, mp-776591, mp-776741, mp-776762, mp-777351, mp-777445, mp-777462, mp-779308, mp-779719, mp-780156, mp-780159, mp-780728, mp-781628, mp-787524, mp-850214, mp-850338, mp-850358, mp-850362, mp-850954, mp-851029, mp-865101

For the common IDs, this shows whether the values are identical for every ID:

band_gap             False
elements              True
energy_above_hull    False
formula_pretty        True
nelements             True
nsites               False
structure            False
volume               False

Here is a histogram of band gap change between the two versions for the common IDs:
image

Let me know if you’d like me to share my cache from early 2024.
Hope this info helps the investigations.

Hey @SiLiKhon thanks for your patience. The way we parse band gaps changed slightly Nov, 2024 to improve reliability. The bandgaps of some materials will also changed then to use the r2SCAN bandgap, which tends to be larger than the PBE gap. Other differences are accounted for by the change in NSCF task types.

Longer explanation:

In Nov of 2024, we modified the schema for how we store information about VASP calculations as tasks. As part of that, a field called eigenvalue_band_properties was removed from the calculations field

Now, there are many ways to estimate the band gap from either an approximate dispersion relationship/band structure or the DOS. Estimating the gap automatically can fail because of how the CBM, VBM, and Fermi level are determined

The advantages to the current approach over using eigenvalue_band_properties are summarized in this pymatgen issue. The newer scheme is not as sensitive to Fermi surface broadening techniques. If you’re not familiar with this, it’s basically an approximation we use to treat the partial occupancy of the Fermi level in metals and ensure computational stability.

In the material you found mp-1211100, the eigenvalue_band_properties field gave a very large band gap of 6.139 eV despite the fact this older method also found the CBM to be just below the Fermi level.

That doesn’t make physical sense, and it’s likely that the bandgap is actually 0 eV, consistent with the current bandgap shown there. Indeed, there was also a recalculation of that material that found a 0 eV bandgap

There are <1,000 materials which used this less reliable bandgap estimation method (thanks to @tsmathis for finding that!) in the pre-Nov 2024 version of MP

2 Likes

Thanks a lot @Aaron_Kaplan for the detailed response.

Is there any way to visualize the band structure from the mentioned calculation? I’ve tried these options, though neither have worked:

  • exploring web interface for this material and tasks (couldn’t find the band structure there)
  • mpr.get_bandstructure_by_material_id("mp-1211100")
    • fails with exception that data was not found
  • mpr.materials.electronic_structure.search(material_ids=["mp-1211100"])
    • the bandstructure for the resulting document is None
  • mpr.materials.tasks.search(["mp-1776854"], fields=["task_id", "orig_inputs", "calcs_reversed", "output", "last_updated"])
    • don’t see anything related to the band structure in the output (except for bandgap=0.0)

Am I missing something? Maybe there is a way to explore raw VASP files?

Hey @SiLiKhon sorry for the delayed response. For visualization, try pymatgen.electronic_structure.plotter.BSPlotter. Since there isn’t a calculation for mp-1211100 that is a sweep along high symmetry k-points (NSCF Line task type), there’s no band structure in our published data products to visualize. We currently don’t perform a band-structure calculation for all materials

While we do have the raw VASP files, they’re not currently accessible publicly, that’s something we’re working on

Hi @Aaron_Kaplan
Sorry for the delayed response from my side as well.

Thanks for the suggestion. I see that BSPlotter requires the set of k-points and corresponding eigenvalues. I understand how to get the k-points from the calculations for mp-1211100, but don’t seem to find the eigenvalues - are those public?

We observe from our own calculation (with a different code and, likely, different XC functional), that Fermi level is way below CBM and way above VBM, yielding a nice band gap of around 6eV. So would like to validate the updated MP number.

Thank you!

This should work, and returns a pymatgen BandStructure object. The eigenvalues are in the bands field:

from mp_api.client import MPRester

with MPRester("your api key") as mpr:
    mat_doc = mpr.materials.search(material_ids=["mp-1211100"])[0]
    bs = mpr.materials.electronic_structure_bandstructure.get_bandstructure_from_task_id(
        mat_doc.entries.GGA.data['task_id']
    )

Dear @Aaron_Kaplan! Thanks a lot for the instructions and so sorry for this super delayed response.

Upon checking the eigenvalues, it looks to me like efermi value is wrong. When I count the number of levels below it, I get 841, while it should be 840 (40 electrons time 21 k-points; this is assuming 3 electrons per Li, 3 per B and 1 per H, which for Li2B6H16 gives 40). I don’t see how 841st level can get occupied, given that it’s 6eV above the 840th.

Code to reproduce
import os
from mp_api.client import MPRester
import numpy as np
import dotenv
dotenv.load_dotenv()  # read api key from .env file


with MPRester(os.getenv("MP_API_KEY")) as mpr:
    [mat_doc,] = mpr.materials.search(material_ids=["mp-1211100"])
    bs = mpr.materials.electronic_structure_bandstructure.get_bandstructure_from_task_id(
        mat_doc.entries.GGA.data["task_id"]
    )


print("1. Counting levels below efermi")
all_energy_levels = np.array(list(bs.bands.values()))
print(f"{(all_energy_levels <  bs.efermi).sum() = :d}")
print(f"{(all_energy_levels <= bs.efermi).sum() = :d}")
# Assuming NELECT = 40:
print(f"{40 * len(bs.kpoints)                   = :d}")

print ("\n2. Printing levels near efermi")
all_levels_sorted = np.sort(all_energy_levels.ravel())
i_above_efermi = (all_levels_sorted > bs.efermi).argmax()
for d_i in range(2, -4, -1):
    i = i_above_efermi + d_i
    # `i + 1` in # printout to account for 0-based indexing in python
    print(f" - level #{i + 1:d}  | {all_levels_sorted[i]:8.5f}")
    if d_i == 0:
        print(f"   (efermi)    | {bs.efermi:8.5f}")

Outputs:

1. Counting levels below efermi
(all_energy_levels <  bs.efermi).sum() = 841
(all_energy_levels <= bs.efermi).sum() = 841
40 * len(bs.kpoints)                   = 840

2. Printing levels near efermi
 - level #844  |  2.39330
 - level #843  |  2.39230
 - level #842  |  2.28770
   (efermi)    |  2.28682
 - level #841  |  2.28660
 - level #840  | -3.85300
 - level #839  | -3.85340

Running the same code on the older static task, mp-1342265, gives 840 bands as expected.

(output for task mp-1342265)
1. Counting levels below efermi
(all_energy_levels <  bs.efermi).sum() = 840
(all_energy_levels <= bs.efermi).sum() = 840
40 * len(bs.kpoints)                   = 840

2. Printing levels near efermi
 - level #843  |  2.39310
 - level #842  |  2.28770
 - level #841  |  2.28730
   (efermi)    | -3.78421
 - level #840  | -3.85310
 - level #839  | -3.85320
 - level #838  | -3.86230

Am I missing something?

There was a small bug in older versions of pymatgen that likely gave inaccurate values of the gap (could place the Fermi level at the CBM). I just pulled the DOS for the particular task (mp-1776854) that defines the electronic structure for this material, and got a 5.84 eV gap using pymatgen==2025.5.28 (run pip install pymatgen==2025.5.28):

with MPRester() as mpr:
  dos = mpr.materials.electronic_structure_dos.get_dos_from_task_id('mp-1776854')
print(dos.get_gap())
>>> 5.8354351031170255

Thank you for the quick response.

I’m getting the same dos band gap that you posted even with the older pymatgen==2025.1.9 that I’ve been using so far. And, I still see the problematic efermi and bandstructure band gap values with the new version you suggest.

Do you mean there was a bug in calculate_efermi called when parsing vasp output (I’m linking the tags that I assume were used during the database update in the late 2024)? By the way, is there any way to check the raw VASP efermi value prior to this processing?

The value of efermi saved in the tasks will still be the wrong value, because that value was precomputed. If you call a method like get_gap, that will recompute the VBM and CBM and allow you to get a correct estimate of the gap

We would need to reparse the data to correct the value of efermi that’s saved. A “hacky” way of looking at the correct band properties would be this:

from mp_api.client import MPRester
from pymatgen.electronic_structure.bandstructure import BandStructure

with MPRester() as mpr:
    band_struc_orig = mpr.materials.electronic_structure_bandstructure.get_bandstructure_from_task_id("mp-1776854")
    dos_orig = mpr.materials.electronic_structure_dos.get_dos_from_task_id("mp-1776854")

cbm, vbm = dos_orig.get_cbm_vbm()
band_struc_new = BandStructure(
    [kpt.frac_coords for kpt in band_struc_orig.kpoints],
    band_struc_orig.bands,
    band_struc_orig.lattice_rec,
    vbm,
    band_struc_orig.labels_dict,
    structure = band_struc_orig.structure
)

Running band_struc_new.get_gap() yields a 6.14 eV direct gap.

Unfortunately, right now, there’s no way to access the vasprun.xml or DOSCAR from which this data was extracted. We’re currently working on a solution for this (plus all other VASP outputs) but can’t give an estimate when that will be final

Thank you @Aaron_Kaplan. I confirm that this works for mp-1211100.
What would the proper way be for applying this workaround to arbitrary material_ids?

I tried it on a few random materials and here are some issues I spotted:

  1. Obtaining the right task id. I notice that get_bandstructure_from_task_id and get_dos_from_task_id calls fail for some tasks. Also, in some cases, all the attributes of MPDataDoc.entries are None, including GGA. I see some task info in MPDataDoc.origins, but it’s not clear which one to choose (+ sometimes they all fail).
  2. efermi value for BandStructure constructor. Is there a specific reason why you used vbm in your example? I also tried cbm and (cbm + vbm) / 2, which in most cases all agree in the resulting band gap, but sometimes the results are different (can be zero vs non-zero).
  1. You can generalize the method by looking first at the summary documents, and then seeing which tasks are used for either the bandstructure or dos attributes. The band structure and DOS will have an associated task ID, e.g.:
summ_doc = MPRester().materials.summary.search(material_ids=["mp-149"])[0]
if summ_doc.bandstructure:
  print(summ_doc.bandstructure.latimer_munro.task_id)
if summ_doc.dos:
  print(summ_doc.bandstructure.latimer_munro.task_id)
>>> mp-2693792
>>> mp-2250750
  1. The Fermi level has some ambiguity in an insulator, since you can place it anywhere in the gap. But the safest bet for algorithmically determining bandgaps is usually just to place it at the VBM, since this accounts for both metals and insulators