How can I extract structural information from the data downloaded in bulk from the Materials Project?


image

I have downloaded all the data according to the recommended method of Materials Project. What format should I save the docs in, and how can I extract the structural information?

Hi @Chuanxi-cairen,

I’m not sure what “structural information” means to you, but if you just want the structure field for each of those documents a simple list comprehension will get you there:

structs = [d["structure"] for d in docs]

Re: saving the docs, I am assuming you mean saving the query results to disk? I would recommend just using the json module from the standard library to dump the array into a .json file. All of the docs you get from MP are valid json.

If you are curious about the fields available in each of those documents I would recommend consulting the api reference to get the schema: summary endpoint available fields

You can also get an idea for the schema for the other endpoints by expanding the other entries there

image
image
Thank you very much! I extracted a list using strucs2 = [d["structure"] for d in docs] from docs, but what I actually need is information in the form of Structure objects. What should I do?

Gotcha, easy enough using the same principle:

# snip, your above code
from pymatgen.core import Structure

structs = [Structure.from_dict(d["structure"]) for d in docs]

This pattern applies broadly to most pymatgen objects.

It works. Thanks a lot!