Problem
Including a search field in mpr.summary.search
as a kwarg but not in fields
returns SummaryDoc
objects with default (often incorrect, non-None
) values of that attribute. This leads to weird scenarios like searching for non-theoretical materials, and getting back the correct subset, but all of their theoretical
attributes are True
.
Example
If I run the following code:
from mp_api import MPRester
with MPRester(api_key='UR_API_KEY') as mpr:
docs = mpr.summary.search(theoretical=False, fields=["energy_above_hull", "formula", "material_id"])
docs = [d for d in docs]
print("n materials", len(docs))
print("n theoretical materials:", len([d for d in docs if d.theoretical]))
I get
/Users/ardunn/alex/lbl/projects/common_env/textenv_py310/lib/python3.10/site-packages/mp_api/client.py:138: builtins.UserWarning: Problem loading MPContribs client: Duplicate operationId: download_entries
Retrieving SummaryDoc documents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 49794/49794 [00:09<00:00, 5094.30it/s]
n materials 49794
n theoretical materials: 49794
Perplexingly, my query of non-theoretical materials returned all theoretical materials(!)
After actually looking at some of these entries and doing some more queries, I realized these 49k entries were in fact not theoretical, but their theoretical
attribute is being set incorrectly. Note that I did not include theoretical
as a field to be returned, and yet for all the entries it returned True
…
If I do the same query, but this time including theoretical
in the fields arg, I actually do get the correct theoretical
attrs:
from mp_api import MPRester
with MPRester(api_key='UR_API_KEY') as mpr:
docs = mpr.summary.search(theoretical=False, fields=["energy_above_hull", "formula", "material_id", "theoretical"])
docs = [d for d in docs]
print("n materials", len(docs))
print("n theoretical materials:", len([d for d in docs if d.theoretical]))
/Users/ardunn/alex/lbl/projects/common_env/textenv_py310/lib/python3.10/site-packages/mp_api/client.py:138: builtins.UserWarning: Problem loading MPContribs client: Duplicate operationId: download_entries
Retrieving SummaryDoc documents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 49794/49794 [00:09<00:00, 5375.82it/s]
n materials 49794
n theoretical materials: 0
The Point
I’m not sure if this is intended behavior for the API (since according to the API doc the default value of theoretical
is True), but does not this behavior seem a bit confusing? In contrast, why not have the default value just be None
?
Apologies if this is the way things are supposed to work, it just seemed unintuitive to an end user
Using version 0.24.4 of mp-api in python 3.10.1 on MacOS Monterey 12.4.