Mine uniform electronic band structures from Materials Project

Hi MP users and founders,

In this post, I will show how MP users can get uniform electronic band structures for a large number of materials. I also hope that the MP founders will pay attention to some problems and fix them.

The data of band structure and other properties can be obtained through API using Materials Project ID

from pymatgen.ext.matproj import MPRester
mpr = MPRester(“key”)
uni_bs = mpr.get_bandstructure_by_material_id(material_id=mp_id, line_mode=False)

There are two approaches to get mp_id.

  1. Using mp_id range which was published in [Ricci, Francesco et al. (2018), Data from: An ab initio electronic transport database for inorganic materials, Dryad, Dataset, [https://doi.org/10.5061/dryad.gn001]. According to this paper, the data for 48000 compounds were available.

Part 1: transport data for mp_id from 1 to 6565
Part 2: transport data for mp_id from 6566 to 14048
Part 3: transport data for mp_id from 14049 to 18774
Part 4: transport data for mp_id from 18775 to 23660
Part 5: transport data for mp_id from 23662 to 28155
Part 6: transport data for mp_id 28156 from to 30189
Part 7: transport data for mp_id from 30192 to 505772
Part 8: transport data for mp_id from 505774 to 542734
Part 9: transport data for mp_id from 542737 to 555975
Part 10: transport data for mp_id from 555976 to 558469
Part 11: transport data for mp_id from 558475 to 560944
Part 12: transport data for mp_id from 560947 to 567053
Part 13: transport data for mp_id from 567055 to 570909
Part 14: transport data for mp_id from 570910 to 601867
Part 15: transport data for mp_id from 601871 to 639724
Part 16: transport data for mp_id from 639727 to 667327
Part 17: transport data for mp_id from 667335 to 680604
Part 18: transport data for mp_id from 680610 to 698491
Part 19: transport data for mp_id from 698494 to 720300
Part 20: transport data for mp_id from 720312 to 753160
Part 21: transport data for mp_id from 753162 to 761565
Part 22: transport data for mp_id from 761566 to 764784
Part 23: transport data for mp_id from 764785 to 768341
Part 24: transport data for mp_id from 768342 to 771645
Part 25: transport data for mp_id from 771647 to 775067
Part 26: transport data for mp_id from 775068 to 779221
Part 27: transport data for mp_id from 779222 to 861898
Part 28: transport data for mp_id from 861906 to 1006278

The query of data will be devided into 28 parts (corresponding to 28 range of mp_id listed above) as well. I’ve obtained 38265 uniform band structures. I missed 9735 (~20%) materials among 48000.

  1. Screen mp_ids which have uniform band structures using API first before going to download data

from pymatgen.ext.matproj import MPRester
mpr = MPRester(“key”)
entries = mpr.get_entries({‘has_bandstructure’:True})

lines = “”
for entry in entries:
lines += entry.entry_id + “,”

f = open(“possible_bs_mpids.dat”,“w”)
f.writelines(lines+"\n")
f.close()

There are 52827 mp_ids saved in possible_bs_mpids.dat file but I just obtained 39782 uniform band structures, only 75% of data as expected.

A quick check shows that there might be a different number of mp_ids between two queries. In comparison, there are 1326 compounds which appear in query 1 but not in query 2.