API server error for some queries

Looks like for some chemical systems, I get MPRestError with the same code that works for most of other systems. The error seems a bit sporadic, but mostly reproducible. For example, the following code:

m = MPRester()
res= m.query(criteria={“elements”: {"$in": [‘Ge’, “Se”,‘O’] }, ‘nelements’:{"$eq":3}},

produces the following:

File “/home/abc/anaconda/lib/python2.7/site-packages/pymatgen/matproj/rest.py”, line 150, in _make_request
raise MPRestError(msg)
pymatgen.matproj.rest.MPRestError: REST query returned with error status code 500. Content:

500 Internal Server Error

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Apache/2.2.15 (CentOS) Server at www.materialsproject.org Port 443

I’m not 100% sure why the query’s giving this error (we’re looking into it), but the query as structured is pretty heavy, so it’s likely timing out. Just to double-check, the $in operator is asking for every compound that contains any of Ge, Se, or O, rather than compounds that contain exclusively those three elements, which should be queried using the $all operator: e.g.

m = MPRester()
res= m.query(criteria={'elements': {"$all": ['Sr', 'Fe','O'] }, 'nelements':3}, 

Is this what you intended?

What is happening is that you are requesting so much data in one shot (full structure information for nearly 18,000 compounds – all three-element compounds containing any of Ge, Se, or O) that the server process handling your request runs out of the memory it’s been allocated, and you get a generic and unhelpful server error response. Sorry about that.

In an effort to be more helpful, we now estimate the response size before fully executing it and, if it’s too large, you’ll get a MPRestError: Too much data requested in one query. Please break down your query into sub-queries.

Below is some example code that would be helpful for your specific query. I use the pydash library to chunk the list of material ids I get from an initial query and use the chunks to get the material properties desired. I also use the tqdm library to measure in-loop progress for each chunk.

from pydash import py_
from pymatgen import MPRester
from tqdm import tqdm

m = MPRester()

properties = ["material_id","structure","final_energy_per_atom"]
criteria = {"elements": {"$in": ["Ge", "Se", "O"] }, "nelements":{"$eq":3}}

res_ids = [r["material_id"] for r in
           m.query(criteria=criteria, properties=["material_id"])]

results = []
for chunk in tqdm(py_.chunk(res_ids, 1000)): # 1000 materials at a time
    results.extend(m.query(criteria={"material_id": {"$in": chunk}},


Your code still seems to produce an error. Specifically, if I run the code as given in Joseph’s response, it works, however, if I change his set of elements [‘Sr’, ‘Fe’,‘O’] to the one I started with, [‘Ge’, ‘Se’,‘O’], I get an error:

Traceback (most recent call last):
File “main.py”, line 23, in
File “/home/abc/anaconda/lib/python2.7/site-packages/pymatgen/matproj/rest.py”, line 601, in query
File “/home/abc/anaconda/lib/python2.7/site-packages/pymatgen/matproj/rest.py”, line 150, in _make_request
raise MPRestError(msg)
pymatgen.matproj.rest.MPRestError: local variable ‘first’ referenced before assignment. Content: {“valid_response”: false, “version”: {“pymatgen”: “2017.11.9”, “db”: “2.0.0”, “rest”: “2.0”}, “traceback”: “Traceback (most recent call last):\n File “/var/www/python/matgen/materials_django/rest/rest.py”, line 91, in wrapped\n d = func(*args, **kwargs)\n File “/var/www/python/matgen/materials_django/rest/rest.py”, line 182, in query\n raise RESTError(str(ex))\nRESTError: local variable ‘first’ referenced before assignment\n”, “created_at”: “2017-11-10T17:11:04.104601”, “error”: “local variable ‘first’ referenced before assignment”}

Maybe I’m missing something here: I am not looking to get thousands of materials, and I don’t want to get entries that contain elements other than [Se,Ge,O] (e.g. I don’t need NiO); however, I do want to get binaries and pure elements (e.g. GeO2 or O2). How do I return the entire chemical system (with all subsystems) but nothing else?

In my original example I simplified this by requesting nelements==3, but my actual code uses nelements<=3, and then discards the structures with additional elements:

if( set(entry[“elements”]) <= set(element_list) ):

If I instead try using “$all” keyword with ‘nelements’:{"$lte":3} (for Sr-Fe-O as in Joseph’s response, because I am still getting a server error for Ge-Se-O, see my previous reply), it returns me only ternaries no matter what condition I set for nelements! What should I do?

Sorry, in an effort to better quantify large queries we’ve introduced a server side bug we’re resolving that affects queries that don’t turn up any results (we don’t have any data for a ternary Ge-Se-O compound currently). Here’s a snippet that works for me for the time being. This is essentially how the MPRester.get_entries_in_chemsys method works, by taking combinations of the elements and querying them individually.

import itertools
from pymatgen import MPRester
m = MPRester()

elements = ["Ge", "Se", "O"]
entries = []
for i in range(len(elements)):
    for combo in itertools.combinations(elements, i + 1):
            entries.extend(m.query({"elements": {"$all": combo}, "nelements": i+1}, 
1 Like

Hi Sergey,

Thank for clarifying your intended query. Joey’s right in that our MPRester.get_entries_in_chemsys convienience method is designed precisely to iteratively call MPRester.query with the appropriate criteria to build up a list of results for e.g. creating phase diagrams of entire chemical systems. For example:

from pymatgen import MPRester

m = MPRester()
res = m.get_entries_in_chemsys(
    ["Ge", "Se", "O"], inc_structure="final",
# structures accessed as e.g. res[0].structure
# material ids accessed as e.g. res[0].entry_id
# `property_data` accessed as e.g. res[0].data
# You can also just do
# m.get_entries_in_chemsys(
#     ["Ge", "Se", "O"],
#     property_data=["material_id", "structure", "final_energy_per_atom"])
# and access all properties via e.g. res[0].data

Thanks a lot, this works!