DOS, Volume, and GGA

Minju · August 18, 2025, 3:00am

Hello everyone.
I have been pondering this for the past few weeks, and I’ve brought something that I’m still unsure about.
Of course, this might be basic content, but… even after looking at other people’s Q&A, there are many parts that are not well understood or are unclear.

1. DOS data of GGA and r2SCAN calculation data
In the case of Density of states, I obtain an energy and density database in the following manner, which is then used for plotting graphs.

 76             dos                 = mpr.get_dos_by_material_id(mid)
 77             energies            = dos.energies            # numpy array
 78             densities           = dos.get_densities()     # numpy array
 79             efermi              = dos.efermi
 80             idx                 = (abs(energies - efermi)).argmin()
 81             dos_at_fermi        = densities[idx]

The code for obtaining Entry energy and Cohesive energy was used as follows.
Unlike Cohesive, I understand that Entry values can differ between GGA and r2SCAN.

 49                 # Get GGA energy
 50                 thermo_docs = mpr.materials.thermo.search(material_id)
 51                 for thermo_doc in thermo_docs:
 52                     if "GGA" in thermo_doc.entries:
 53                         gga_energy_per_atom = thermo_doc.entries["GGA"].uncorrected_energy_per_atom  # GGA energy per atom
 54                         total_energy = thermo_doc.entries["GGA"].uncorrected_energy  # total energy in GGA
 55                         task_id = next(prop.task_id for prop in thermo_doc.origins if prop.name == "energy")
 56                         composition = thermo_doc.composition.as_dict()

265             # Get Cohesive energy
266             cohesive_energy = mpr.get_cohesive_energy(                                                                                                                                                                           
267                     material_ids=[material_id], normalization="atom"    # normalize with number of atoms
268                     )[material_id]                                      # Results are returned as dictionary

In DOS, it is thought that there would be different values depending on the r2SCAN and GGA methods. What calculation method is the DOS in the database based on?
Since Materials Project uses meta-GGA, are you using the calculated values from r2SCAN calculations for GGA structural optimization?

2. Normalization through Volume
I want to normalize the DOS through volume, and to do this, I need to know the structure of the cell used during calculation to add to the database. Are these calculation results from a primitive cell or a conventional cell?

Most data structures that can be obtained from the Materials Project website (to the extent of my knowledge, all of them) have a conventional cell structure. Would it be correct to use the volume of the conventional cell for normalization?

After seeing another user’s query and modifying the code to obtain the following data, I’m unsure what it means. In cases where both Conventional and Primitive are True, is the data uploaded based on a Conventional or Primitive structure?
Are DOS and BAND data based on primary cells or conventional cells?

Formula (Conventional structure)	MPID	DOS_Primitive	DOS_Conventional	BS_Primitive	BS_Conventional
Al1Co1	mp-284	True	True	True	True
Be1Co1	mp-2773	True	True	True	True
Be20Co4	mp-1071690	True	False	True	False
Be6Co2	mp-1183423	True	False	True	False
Ce8Co16	mp-1112	True	False	True	False
Co12B4	mp-20373	True	True	True	True
Co12S18	mp-1183728	True	False	True	False
Co12Se16	mp-20456	True	False	True	False
Co1I2	mp-569610	True	False	True	True

3. Difference between DOS calculated directly and DOS extracted from database

(The data above is raw data from API. Not normalized)

I understand that there can be differences in the graph shape because the values are extracted from SCF calculation results.
(The reason for extracting from SCF is that rough data was needed for pre-screening.)
However, the actual density values appear lower than those obtained through the API. What is the reason for this?

I am grateful to those who diligently answer my often insufficient questions.
I always end up writing long texts, and I am thankful that you take the time to read them with interest.

Aaron_Kaplan · August 18, 2025, 4:41pm

The DOS can either be from a PBE GGA or r2SCAN meta-GGA calculation, you’d have to look at the task ID and then extract the task document corresponding to that task ID. The structure volume to normalize the DOS by is included with the DOS:

from mp_api.client import MPRester

with MPRester() as mpr:
    # get all electronic structure info:
    estruct_doc = mpr.materials.electronic_structure.search(material_ids=["mp-284"])[0]

    # Inspect task IDs associated with the electronic structure document
    print(f"DOS task ID = {estruct_doc.dos.total['1'].task_id}")
    print(f"Band structure task ID = {estruct_doc.task_id}")

    # Retrieve the task corresponding to the electronic DOS:
    dos_task = mpr.materials.tasks.search(task_ids=[estruct_doc.dos.total["1"].task_id])[0]

    # Get the electronic DOS:
    dos = mpr.get_dos_by_material_id("mp-284")

print(dos_task.task_id,dos_task.calc_type)
# Use pymatgen's features to normalize the DOS
normalized_dos = dos.get_normalized()

# or use the associated structure in the DOS to do so:
norm_vol = dos.structure.volume

Minju · August 28, 2025, 2:09am

Hello, Aaron.
I always thank you for helping me with my questions.

The structure volume to normalize the DOS by is included with the DOS

You mentioned this, so in the code, does the dos.get_normalized() part automatically perform normalization by volume using pymatgen, similar to how I directly divided dos.density by volume?
(based on my test results, it seems to be correct)

Lastly, if you could provide just one additional comment about the difference in the size of the DOS drawn from the SCF calculation results and the DOS obtained through the API, I would be grateful.

Some data cannot be called from the API because they lack DOS data, which means these substances have not had their electronic structure updated in the database. These substances also do not have their band structure drawn on the web.
I will list some of the materials without DOS below.

8206 ========================================
8207 Missing DOS data:
8208 ========================================
8209 Ac2In6 mp-866286
8210 Ac2In6 mp-984785
8211 Ac2In6 mp-1525815
8212 Ac2In6 mp-864996
8213 Ag16O8 mp-2047939
8214 Ag4O4 mp-2803876
8215 Ag3Te9 mp-1214894
8216 Al2Re2 mp-10909
8217 Al2Tc4 mp-1228001
8218 B4Mo4 mp-629015
8219 B4Pd12 mp-1105080
8220 B2Pt4 mp-2049185
8221 Ba6Sr2 mp-1183369
8222 Ba8Al10 mp-1214488

Best regards.

Aaron_Kaplan · August 28, 2025, 3:50pm

You mentioned this, so in the code, does the dos.get_normalized() part automatically perform normalization by volume using pymatgen, similar to how I directly divided dos.density by volume?

Yes it divides the DOS by the volume of the structure associated with the DOS (link). That’s a little hard to see from the code, but it just re-creates a DOS with the normalized densities

Lastly, if you could provide just one additional comment about the difference in the size of the DOS drawn from the SCF calculation results and the DOS obtained through the API, I would be grateful.

Not all of these will have a bandstructure along the high symmetry points in the BZ. That’s a specific calculation that doesn’t always get run.

But these should all have a DOS associated with them, I’ll have to look into that with the team. Thanks for the catch!

Minju · September 11, 2025, 8:27am

Thank you @Aaron_Kaplan , always for giving me good advice.

I have some questions about VASP calculations I’m recently doing and would like to ask again.

Summary:

Does MPStaticSet initial structure have relaxed crystal structure from MPRelaxSet?
When converting data imported via MPStaticSet to get_primitive_structure(), is MAGMOM in INCAR properly converted?
Is MAGMOM a site property stored within MPStaticSet?
Why do structures converted to primitive cell often not converge or show symmetry errors?
How can I obtain the exact actual structure of materials on Materials Project Web (data obtainable via API)?

- Detailed

234 for idx, structure_summary in enumerate(structures,1):
235 try:
236 # — 1. Data collect —
237 material_id = structure_summary.material_id # MPID
(…)
245 # — 2. Get structure and preprocessing —
246 structure = mpr.get_structure_by_material_id(material_id, final=use_initial_structure)
247 if isinstance(structure, list):
248 structure = structure[0]
(…)
254 primitive_cell = structure.get_primitive_structure()

Whether structure (POSCAR) with MPStaticSet initial structure and MPRelaxSet final structure are same
Uncertainty about MAGMOM preservation when converting to primitive cell (also INCAR)
Concerns about site property retention during structure transformation (does site property includes proper MAGMOM data?)
Frequent symmetry errors during VASP calculation with converted primitive cell

At this point, the API stores INCAR data in Site property data, and when converting to Primitive cell, it would write MAGMOM according to the number and order of POSCAR atoms, and also load other INCAR options.
(Please let me know if this is incorrect.)

The symmetry error while using primitive cell

Additionally, what structures are the VASP calculation data provided by Materials Project based on?
The structures downloaded from the web seem to be provided based on conventional cell, but is there a way to see the structure actually used for writing material properties (band structure and magnetization, etc.) like the code where you could see the DOS structure?

The code that build static calculation structure I use.
SCF_Structure_Builder.py (13.4 KB)