I’m trying to migrate from the legacy MPRester to the new one.
Previously, I could grab the properties of all materials like this:
with MPRester(api_key) as mpr:
data = mpr.summary.search(
criteria={"task_id": {"$exists": True}},
properties= ["material_id", "final_energy", "structure"],
)
How would I do this in the new api with mpr.summary.search? It doesn’t look like MongoDB queries are accepted anymore.
I think that’s exactly it – .search() will return all documents by default, and then you can ask for whatever fields you would like. This should be efficient and will paginate automatically.
I think we should probably add a FAQ to our docs too to explain why the MongoDB queries were removed for people who were using them. If there are any operators/searches that used to be possible and are no longer possible, it should be straight forward for us to add them back, so feedback like this is very welcome.
Thanks for the reply! Good to know I’m on the right track haha.
But I’m actually running into issues with the .search() approach. It’s not super efficient and is also unstable for “larger” queries.
When I try to grab all mp_id’s, I use the following:
mpr.summary.search(fields= ["material_id"])
But this takes 29min to run (even with >800 Mbps download speed + a bulky PC).
As for the stability, the following search keeps failing with a 502 error code. It does this at a random chunk (sometimes at 10% progress, other times at 60%, etc.) and I haven’t been able to execute the query successfully yet:
fields_to_load = [
"material_id",
"last_updated",
"structure",
"uncorrected_energy_per_atom",
"energy_per_atom",
"is_magnetic",
"total_magnetization",
"theoretical",
]
data = mpr.summary.search(
all_fields=False,
fields=fields_to_load,
deprecated=False,
# I've been trying smaller chunk sizes too
# chunk_size=1000,
)
I might open a feature request to specify the chunk_start as well – so I don’t have to restart this query each time it fails.
Sorry if this is bad new or creates problems! I’d appreciate any help though.
This appears to be a client issue. I am working on a fix now.
For now, you can instantiate MPRester with use_document_model=False to avoid the post processing that is causing the problem. This will allow your first query to run in 10-20 seconds.