Downloading all piezoelectric materials

Hello,

I’m writing to inquire about downloading the structure-property metadata for the 941 piezoelectric materials analyzed in de Jong et al.’s 2015 paper in Nature Scientific Data. Rather than querying each structure based on specific properties, we are interested in downloading and analyzing the entire set of 941. Is these an efficient way for us to do this?

Thank you!

Yes, you can use our API to get the data you need. Using pymatgen.MPRester,

from pymatgen import MPRester
mpr = MPRester()
data = mpr.query({'piezo': {'$exists': True}}, ['material_id', 'piezo'])

You can read more about use of our API, in particular the powerful /rest/v2/query endpoint in conjunction with MongoDB query syntax, here.

1 Like

Hi @dwinston, Is it possible to do this from the Materials project website directly without needing to write the code? I understand that one can use the API and obtain a json file containing all the information. Now, Can I obtain the structure files (.cif) files directly in a specified location (as is usually the case when one downloads from the website). For example, I want to download all the materials whose band gap lies between 0.5 to 1.2 eV or the materials which belongs to the Zintl phase compounds. How would I approach this problem? Is there a way (in general) for such requirement?

[*** Probably this is an extension to the original question]

Hi @George_Yumnam, there is currently no quick way via the website to obtain all CIFs for a specified set of structures. This is most flexibly and reproducibly done via the API.

I have written a small iPython notebook walking through building a query for Zintl compounds, fetching data (including CIF strings) for those compounds, and using pymatgen to write out CIF files in a zip archive.

To ease issues with running the code locally on your computer, I have registered a Jupyter notebook binder that you can launch in your browser. Click on the get_zintl_cifs.ipynb link after launching to open the notebook and walk through it. If you’re unfamiliar with the interface, you can click on Help at top once the notebook is open, and then User Interface Tour.

For your band gap query, replacing

{'chemsys': {'$in': zintl_systems()}}

with

{'band_gap': {'$gte': 0.5, '$lte': 1.2}}

in the notebook will do the trick. The query syntax is that of MongoDB, and hierarchical documentation on the data we have available for querying is at our API reference repository.

This approach is the most general way to get nearly anything you want from our database. The website offers convenient shortcuts for typical use cases. I do see your use case as common enough to warrant additional features on the website. We’ll try to work in those features soon.

1 Like

Hey @dwinston, thank you very much for this wonderful code. I however see that there is an error with the iteration in docs as written in this code while launching it from our local jupyter notebook. I noticed that it worked fine while running it from the notebook binder launched from the link you’ve provided. However, this is the error which I get while launching it from my own local notebook:

TypeError                                 Traceback (most recent call last)
<ipython-input-7-4ed7a36bbf0b> in <module>()
      1 mpr=MPRester("----")
      2 
----> 3 docs = mpr.query({'chemsys': {'$in': zintl_systems()}}, 
                           ['material_id', 'pretty_formula','cif'])
      4 
      5 with ZipFile('zintl_cifs.zip', 'w') as f:

<ipython-input-6-c7ebf31168bc> in zintl_systems()
      5         of the form [...,"Na-Si",...,"Na-Tl",...]
      6     """
----> 7     first_el = {el.symbol for el in Element
      8                 if el.is_alkali or el.is_alkaline}
      9     second_el = {el.symbol for el in Element

TypeError: 'type' object is not iterable

My guess is that the discrepancy between the binder environment and your local environment is your version of pymatgen. Please try upgrading (pip install -U pymatgen). I think that Element was not iterable in past versions.

Thanks a lot! The script worked fine with the new version of pymatgen (4.3.0).

However, there are some bugs while running the code. (attached below)

pymatgen.matproj.rest.MPRestError: ('Connection broken: 
    IncompleteRead(6990 bytes read, 3250 more expected)', 
    IncompleteRead(6990 bytes read, 3250 more expected))

Is this normal, or is this due to slow connection?

“I require two-three times execution of the script to make it run completely.”

It’s not normal. It’s a low-level network error due to an unstable connection.

1 Like

Thanks a lot once again @dwinston

This Discussion is very useful to query the data sets from materials project. Also those notebooks from dwinston are awesome.

Hi @dwinston , Is there a way by which I can download .cif files if I have a list of material id?
Please suggest some solution. I am not familiar with python scripting so your help will be appreciated.
Thanks in advance!

For example, how can I download cif files if I have material id’s such as, mp-10070
mp-10086
mp-10096
mp-10103
mp-10155
mp-10159
mp-10163
mp-1022
mp-10223
@dwinston kindly help with some solution.

Thanks!

Hi @neuron, welcome to the forum. It’s really useful to be able to use a script, so that you don’t need to click to download one by one. The bare URL for a cif file is something like

https://materialsproject.org/materials/mp-10070/cif?type=symmetrized

where the type defaults to conventional_standard if not specified. So if you had a file of material IDs, one per line, you could use a loop of some kind in any scripting language to replace the material ID in the URL string. I’m not sure if you are on a Mac, Linux, Windows, etc., but I think if you go to a colleague with some experience in any scripting language, my description here will be sufficient for them to teach you to write a script of a few lines to download all the CIF files you need. Hope this helps.

Hi everyone I made a shell script in linux to download cif files, it can be used to download the primitive cif files of the materials of interest using the MP id.

Thanks @dwinston for your suggestion!!

Script:for .sh file

#write the required material id as follows

for i in mp-10070 mp-10086 mp-10096 mp-10103

do

#use the link below to get primitive cif files using material id, for conventional use type=symmetrized

wget “https://materialsproject.org/materials/$i/cif?type=primitive

#replace type=primitive below with type=symmetrized

mv ‘cif?type=primitive’ $i.cif

done

1 Like