MPRester - Pulling data using a list of materials_ids

I’m in the process of creating a set of data pulled from Materials Project using pymatgen’s MPRester.

What I have: A list of 50,000 relevant materials_id (obtained by a prior MPRester search based on composition – retrieving materials_id among other properties – which was further filtered using other methods).
What I want: X and Y properties for each of those materials_id
What I tried:

ids = [id1, id2, ..., id50000]
with MPRester(api_key=API_KEY) as mpr:
    mpresults = mpr.query(criteria={'material_id':{'$in':ids}}, properties=["X", "Y"])

What’s wrong: I asked for (X, Y) for 50,000 entries but mpresults has only ~30,000 entries. I isolated the list of ids which are missing from mpresults and retried the query, which returned an empty list.

I suspect this might have something to do with the new database schema since the cif files were retrieved and written some months ago.

I would rather not try to recreate the data set so is there any way to correct the mp-ids on my existing cif files?

Hi @kdmiller,

Welcome to the forum!

You should post a specific mp-id so we’re able to investigate and give a more specific response. One thing that might help is changing criteria to criteria={'task_ids':{'$in':ids}}, which can help if a mp-id was re-assigned. You can also use get_task_data.

Re-assigning mp-ids does not happen often but has been unavoidable in some instances. In these cases, the old mp-id will re-direct to the new mp-id on the website, and via the API you can query task_ids which will search across all mp-ids associated with a given material.

Note that even with the new database release, no data was removed from the API, and all data previously available should still be accessible in some capacity.



1 Like

Thanks for the welcome and the rapid response! I tried using task_ids and it seems to have helped in some sense as I retrieved a much higher portion of the 50,000 original entries (missing about 600 still).

Here’s a few of the missing materials_ids.

[ ‘mp-1023920’,