Access properties from all structures in new API

jacksund · August 25, 2022, 3:35pm

Hi everyone,

I’m trying to migrate from the legacy MPRester to the new one.

Previously, I could grab the properties of all materials like this:

with MPRester(api_key) as mpr:
    data = mpr.summary.search(
        criteria={"task_id": {"$exists": True}}, 
        properties= ["material_id", "final_energy", "structure"],
    )

How would I do this in the new api with mpr.summary.search? It doesn’t look like MongoDB queries are accepted anymore.

Thanks in advance for the help!

-Jack

jacksund · August 25, 2022, 3:40pm

Btw the following works:

mpr.summary.search(fields= ["material_id", ...])

But I’m not sure if this is the preferred method or if there’s something more efficient/recommended.

mkhorton · August 26, 2022, 11:10pm

Hi @jacksund,

I think that’s exactly it – .search() will return all documents by default, and then you can ask for whatever fields you would like. This should be efficient and will paginate automatically.

I think we should probably add a FAQ to our docs too to explain why the MongoDB queries were removed for people who were using them. If there are any operators/searches that used to be possible and are no longer possible, it should be straight forward for us to add them back, so feedback like this is very welcome.

Best,

Matt

jacksund · August 26, 2022, 11:30pm

Thanks for the reply! Good to know I’m on the right track haha.

But I’m actually running into issues with the .search() approach. It’s not super efficient and is also unstable for “larger” queries.

When I try to grab all mp_id’s, I use the following:

mpr.summary.search(fields= ["material_id"])

But this takes 29min to run (even with >800 Mbps download speed + a bulky PC).

As for the stability, the following search keeps failing with a 502 error code. It does this at a random chunk (sometimes at 10% progress, other times at 60%, etc.) and I haven’t been able to execute the query successfully yet:

fields_to_load = [
    "material_id",
    "last_updated",
    "structure",
    "uncorrected_energy_per_atom",
    "energy_per_atom",
    "is_magnetic",
    "total_magnetization",
     "theoretical",
]
data = mpr.summary.search(
    all_fields=False,
    fields=fields_to_load,
    deprecated=False,
    # I've been trying smaller chunk sizes too
    # chunk_size=1000,
)

I might open a feature request to specify the chunk_start as well – so I don’t have to restart this query each time it fails.

Sorry if this is bad new or creates problems! I’d appreciate any help though.

munrojm · August 31, 2022, 10:05pm

Hi @jacksund,

This appears to be a client issue. I am working on a fix now.

For now, you can instantiate MPRester with use_document_model=False to avoid the post processing that is causing the problem. This will allow your first query to run in 10-20 seconds.

– Jason

munrojm · September 1, 2022, 12:47am

@jacksund I have just resolved the speed issue. The latest version of the client mp_api==0.26.4 has the fix.

– Jason

jacksund · September 1, 2022, 3:46pm

Awesome! Thanks for the fix and quick turnaround! Also, I overlooked the use_document_model input – that will come in handy.

Thanks again,
Jack