How to obtain batch data except rare earth elements

Dear all:
I want to get a lot of data, but there are access restrictions, how to deal with this?

from mp_api.client import MPRester
import pandas as pd

with open(‘key.txt’) as f:
key = f.read()

with MPRester(api_key = key) as mpr:

docs = mpr.summary.search(exclude_elements=['Pb','La','Ce','Pr', 'Nd', 'Pm','Sm','Eu','Gd',
                                            'Tb','Dy', 'Er', 'Tm', 'Yb', 'Lu', 'Y', 'Sc'],
                          fields=["material_id",
                                  "formula",
                                  "band_gap",
                                  "volume"
                                  ])


test = pd.DataFrame(docs)
print(test)
test.to_csv('./b.csv', encoding='gbk')

mp_api.client.core.client.MPRestError: HTTPSConnectionPool(host=‘api.materialsproject.org’, port=443): Max retries exceeded with url: /materials/summary/?exclude_elements=Pb%2CLa%2CCe%2CPr%2CNd%2CPm%2CSm%2CEu%2CGd%2CTb%2CDy%2CEr%2CTm%2CYb%2CLu%2CY%2CSc&_limit=1000&_fields=material_id%2Cformula%2Cband_gap%2Cvolume&_skip=46000 (Caused by ProtocolError(‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’)))

Just checking, have you updated to the most recent version of mp-api?

pip install --upgrade mp-api

@mattmcdermott yes, the request goes to /materials/summary/ which means his client is up to date. As far as I can tell from the logs, most of his requests go through successfully but for some the API server takes too long to respond and the client closes the connection. That can happen when the server is temporarily busy. We’re looking into whether this is an efficiency issue on our end or if we can improve the client to retry the failed requests instead of aborting. @munrojm

2 Likes

Thank you very much and look forward to hearing from you

@Xyy11, this should be working now. I can execute the same query without issue. Since you are pulling the majority of the database with this query (~112k entries), it may also be fast to simply supply no criteria, keep the fields projection, and just filter locally. In general, you can also get an initial sense of the number of documents a particular query will match to with the count method.

– Jason