Hi all,
I was trying to do some analysis on the distribution of elements in materials project today when I noticed something.
When I query the database for the material_id
s and elements
of all materials, and then select materials containing hydrogen I get 10,449 materials.
When I query the database for materials containing just hydrogen I get 10,394 materials (which matches the website).
When I lookup the 55 IDs which are only obtained by the first approach, I don’t find anything.
Here’s the code to reproduce this, sorry if it’s messy, it was a jupyter notebook originally:
from mp_api.client import MPRester
from pathlib import Path
API_KEY = ""
with MPRester(API_KEY) as mpr:
docs = mpr.materials.summary.search(fields=["material_id", "elements"])
mp_elements = {}
for doc in docs:
mp_elements[doc.material_id.string] = [element.number for element in doc.elements]
cutoff_element = 18 # Argon (two rows of periodic table)
materials_by_element = {}
for element in range(1, cutoff_element+1):
materials_by_element[element] = []
for (mp_id, elements) in mp_elements.items():
if element in elements:
materials_by_element[element].append(mp_id)
counts = {element: len(mp_ids) for element, mp_ids in materials_by_element.items()}
print("counts per element", counts)
with MPRester(API_KEY) as mpr:
docs = mpr.materials.summary.search(
elements=["H"], fields=["material_id", "formula_pretty"]
)
mpid_formula_dict = {
doc.material_id: doc.formula_pretty for doc in docs
}
print("number of materials from approach 2:", len(docs))
h_ids = [id.string for id in mpid_formula_dict.keys()]
weird_ids = [id for id in materials_by_element[1] if not (id in h_ids)]
print("number of weird IDs:", len(weird_ids))
print(weird_ids)
with MPRester(API_KEY) as mpr:
docs = mpr.materials.summary.search(
material_ids=[id for id in materials_by_element[1] if not (id in h_ids)]
)
print("downloaded weird ID docs:", docs)
Here’s the code’s output, with the hydrogen-containing materials I can’t find elsewhere.
Retrieving SummaryDoc documents:
155361/? [06:52<00:00, 430.81it/s]
counts per element {1: 10449, 2: 8, 3: 21761, 4: 1189, 5: 6370, 6: 9083, 7: 11442, 8: 82406, 9: 12136, 10: 1, 11: 12873, 12: 19084, 13: 7805, 14: 12758, 15: 16913, 16: 15397, 17: 6425, 18: 2}
Retrieving SummaryDoc documents: 100%
10394/10394 [00:02<00:00, 3904.72it/s]
number of materials from approach 2: 10394
number of weird IDs: 55
['mp-697915', 'mp-1187975', 'mp-632667', 'mp-634930', 'mp-634751', 'mp-864603', 'mp-625103', 'mp-626421', 'mp-632348', 'mp-1070852', 'mp-2646948', 'mp-1025273', 'mp-1103732', 'mp-626413', 'mp-643108', 'mp-1207586', 'mp-740759', 'mp-1206323', 'mp-1018646', 'mp-1018647', 'mp-1187892', 'mp-1207571', 'mp-1207559', 'mp-979964', 'mp-1198634', 'mp-1195507', 'mp-1105386', 'mp-1216487', 'mp-643246', 'mp-1195012', 'mp-1195544', 'mp-1203501', 'mp-643071', 'mp-1200022', 'mp-705525', 'mp-555985', 'mp-1202633', 'mp-1202946', 'mp-1202882', 'mp-1198247', 'mp-1238179', 'mp-1200794', 'mp-1191250', 'mp-697925', 'mp-699393', 'mp-1212344', 'mp-722346', 'mp-1203140', 'mp-1200555', 'mp-1202119', 'mp-1193866', 'mp-1190437', 'mp-1198865', 'mp-1200481', 'mp-1200272']
Retrieving SummaryDoc documents:
0/0 [00:00<?, ?it/s]
downloaded weird ID docs: []
The same happens for other elements. I can supply more IDs if you’d like.
Please let me know if I’m missing something here!
Thanks