MPRester - Pulling data using a list of materials_ids

I’m in the process of creating a set of data pulled from Materials Project using pymatgen’s MPRester.

What I have: A list of 50,000 relevant materials_id (obtained by a prior MPRester search based on composition – retrieving materials_id among other properties – which was further filtered using other methods).
What I want: X and Y properties for each of those materials_id
What I tried:

ids = [id1, id2, ..., id50000]
with MPRester(api_key=API_KEY) as mpr:
    mpresults = mpr.query(criteria={'material_id':{'$in':ids}}, properties=["X", "Y"])

What’s wrong: I asked for (X, Y) for 50,000 entries but mpresults has only ~30,000 entries. I isolated the list of ids which are missing from mpresults and retried the query, which returned an empty list.

I suspect this might have something to do with the new database schema since the cif files were retrieved and written some months ago.

I would rather not try to recreate the data set so is there any way to correct the mp-ids on my existing cif files?

Hi @kdmiller,

Welcome to the forum!

You should post a specific mp-id so we’re able to investigate and give a more specific response. One thing that might help is changing criteria to criteria={'task_ids':{'$in':ids}}, which can help if a mp-id was re-assigned. You can also use get_task_data.

Re-assigning mp-ids does not happen often but has been unavoidable in some instances. In these cases, the old mp-id will re-direct to the new mp-id on the website, and via the API you can query task_ids which will search across all mp-ids associated with a given material.

Note that even with the new database release, no data was removed from the API, and all data previously available should still be accessible in some capacity.

Best,

Matt

1 Like

Thanks for the welcome and the rapid response! I tried using task_ids and it seems to have helped in some sense as I retrieved a much higher portion of the 50,000 original entries (missing about 600 still).

Here’s a few of the missing materials_ids.

[ ‘mp-1023920’,
‘mp-1024044’,
‘mp-1024045’,
‘mp-1024046’,
‘mp-1024068’,
‘mp-1029576’,
‘mp-1029673’,
‘mp-1029835’,
‘mp-1029838’,
‘mp-1029923’,
‘mp-1029960’,
‘mp-1030618’,
‘mp-1030631’,
‘mp-1030636’,
‘mp-1030710’,
‘mp-1030726’,
‘mp-1030814’]

Thanks,
Kyle