The current API seem to be mostly tailored toward downloading all information about a single compound. Would it be possible to download a single piece of information about all the compounds?
In particular the request from this site
https://www.materialsproject.org/rest/v1/materials//vasp/density?API_KEY=XXX
with empty material parameter would return all materials in the entire database, but it fails since the request is too large. How would I go about creating this request? Is it possible to download the backend database in its entirety?
My applications is machine learning and I need a rather large data set to build up my models
For future reference, I did find a workable solution:
from pymatgen import MPRester
import urllib.request
import json
if __name__ == "__main__":
MAPI_KEY = "XXXXX" # You must change this to your Materials API key! (or set MAPI_KEY env variable)
# fetch list of a list of all available materials
with urllib.request.urlopen('https://www.materialsproject.org/rest/v1/materials//mids') as myurl:
data = json.loads(myurl.read().decode())
material_ids = data['response'] # 75,000'ish material IDs are returned
with MPRester(MAPI_KEY) as m: # object for connecting to MP Rest interface
criteria={'material_id': {'$in':material_ids[:4]}} # to avoid straining the servers, this is only using the first 4 materials
properties=['energy', 'pretty_formula'] # list a few quanteties of interest
data = m.query(criteria, properties)
print(data)
2 Likes
Hi Vikingscientist,
You were hitting the size limit on returned results, which keeps the API from getting overloaded. The API is well-suited to return the information you’re looking for, but you have to break you query up into smaller batches to avoid this limit.
Whenever I need to do something similar to what you’re trying to do, I first query for all the mp-id’s using the MPRester and store them in a python list. After that, I iterate through the list of mp-id’s and query for the properties of interest about 1000 materials at a time, depending on the property.
r = MPRester():
mp_ids = r.query({}, [“material_id”])
chunk_size = 1000
sublists = [mp_ids[i:i+chunk_size] for i in range(0, len(mp_ids), chunk_size)]
Then you can query for each sublist:
results =
for sublist in sublists:
results = results + r.query({“material_id”:{“$in”: sublist}}, [“pretty_formula”, “structure”])
2 Likes
@Vikingscientist Hello, sorry if this question is a bit late. How do I properly use the code you just posted in the database. Any help would be greatly appreciated