MPRester doesn't seem to work with historical mp-ids that were re-assigned

Anubhav_Jain · June 11, 2020, 11:10pm

I have an older spreadsheet referencing “mp-769834”. Actually, this is in the electronic transport database in MPContribs.

If I search the MP web site for this MP id, I get back results for a different MP id - “mp-761284”. I presume, but did not confirm, that this is actually a proper redirect to the same material but a new id was assigned. So the web site is redirecting properly.

But if I try to get data via MPRester, it doesn’t work unless I use the new id:

from pymatgen import MPRester

mpr = MPRester()
print(mpr.get_data("mp-761284", prop="e_above_hull")[0])  # new ID works
print(mpr.get_data("mp-769834", prop="e_above_hull")[0])  # old ID fails

Anubhav_Jain · June 11, 2020, 11:13pm

Btw two notes on this:

It is fairly common - I am running a script right now but am already up to 500 instances where this error occurs
I am aware that I can use the function:

mpr.get_materials_id_from_task_id("mp-769834")

to convert the old ID to the new ID prior to looking up the data. But, I feel like few people are going to have the knowledge to know that they can throw the old materials ID in this lookup function based on task IDs. Rather, they are just going to assume that the MP data disappeared.

mkhorton · July 16, 2020, 6:47pm

Hi Anubhav, I’ve added some better error handling and re-direction logic to get_structure_by_material_id (in master, not released) as a stop-gap measure. The new API will hopefully handle this a lot more gracefully. I’ll add a better doc page for this too, probably after the workshop. Agreed it’s confusing …

mkhorton · July 16, 2020, 6:48pm

Also for anyone using MPRester.query the ‘cheat’ method is just to query using the task_ids field rather than task_id or material_id since this will match all task_ids for a given material. I imagine you might already know this but commenting for others reading this thread.

Anubhav_Jain · July 16, 2020, 10:54pm

Should the default just be to query task_ids when someone specifies a material_id in get_data? The only downside I see is potential slowness, but an advanced user (or MP backend code) could still specify a query dictionary for {material_id: x} to avoid this if they know what they are doing.

mkhorton · July 16, 2020, 11:08pm

Perhaps. There is some muddying of the waters here, e.g. if someone uses get_task_data and get_data for the same id and gets differing results (already an issue). I think using task_id as an alias for materials_id is where much of the confusion comes from.

I could add similar code for get_data though, which is basically: (1) try the supplied id, (2) if no data is found, look up to see if the task id matches a canonical mp-id, and (3) if it does, return that data but give a warning.

The only issue is 1000s of warnings if you do this in bulk. I’d prefer the warnings compared to the case of silently catching it though, because at least the user is informed.

Anubhav_Jain · July 16, 2020, 11:21pm

Yes I think having the warning (to avoid confusion with direct task queries) but still returning data (instead of it seeming like the data disappeared) would be good. Note that to me, it’s much more likely that someone will be in my situation (materials_id changes / gets reassigned) then someone getting “bitten” by having different data from the get_task_data vs get_data functions.