How to get list mp-ids of all 49,705 materials

Hello,

I am working on some Graph Neural Network Based model on materials structure and for that I need to download all 49,705 materials. For that i need a list of all mp-ids.How to get that so that using that through py-matgen i can download the CIF files of those 49,705 materails.Please help me.

Please use the pymatgen interface, with that you can just query for all the structures and then output them to cif. Something like this:

from pymatgen import MPRester

with MPRester("INSERT_API_KEY_HERE") as mpr:
   docs = mpr.query({},["structure","task_id"])
   for d in docs:
       d["structure"].to(filename=f"{task_id}.cif",fmt="CIF")

This will also ensure to get data from our server efficiently.

Thanks for the reply!

What should we pass as “structure” and “task_id” ? How get structure id of all the materials. Plzz help, I am new to this field.

You don’t pass anything as “structure” and “task_id”. That query will ask the DB for every entry but just retrieve the structure and task_id fields.

Ok, But when I am running the above script its throwing following error :

pymatgen.ext.matproj.MPRestError: Too much data requested in one query. Please break down your query into sub-queries.

I think its trying to fetch lot more data together. How to deal with it?

Which version of pymatgen are you using? The latest one should automatically chunk this query for you. I just tried it and it does that.