Dear everyone,
I think that using requests.get to download all Electrolyte Genome data results in some different molecules having the same “task_id” value. The problem is exemplified by the following python script (sorry for the awkward formatting):
import requests, sys, os
if sys.version_info[0] == 2:
from urllib import quote_plus
else:
from urllib.parse import quote_plus
def MAPI_KEY():
try:
return os.environ[“MAPI_KEY”]
except LookupError:
print(“MAPI_KEY environmental variable needs to be set.”)
quit()
urlpattern = {
“results”: “https://materialsproject.org/molecules/results?query={spec}”,
“mol_json”: “https://materialsproject.org/molecules/{mol_id}/json”,
“mol_svg”: “https://materialsproject.org/molecules/{mol_id}/svg”,
“mol_xyz”: “https://materialsproject.org/molecules/{mol_id}/xyz”,
}
def get_results(spec, fields=None):
“”“Take a specification document (a dict
), and return a list of matching molecules.
“””
# Stringify spec
, ensure the string uses double quotes, and percent-encode it…
str_spec = quote_plus(str(spec).replace("’", ‘"’))
# …because the spec is the value of a “query” key in the final URL.
url = urlpattern[“results”].format(spec=str_spec)
return (requests.get(url, headers={‘X-API-KEY’: MAPI_KEY()})).json()
problematic_ids=[“mol-38777”, “mol-38770”, “mol-39643”, “mol-22363”, “mol-25918”, “mol-23146”,
“mol-39001”, “mol-39068”, “mol-14809”, “mol-9187”]
results=get_results({})
for cur_id in problematic_ids:
counter=0
MWs=[]
for molecule in results:
if molecule[“task_id”] == cur_id:
counter+=1
MWs.append(molecule[“MW”])
print(“task_id:”, cur_id, “; times occuring:”, counter, “; MWs:”, MWs)
Could someone tell me whether it’s an issue with the script or the database? Unfortunately, for my application I need to get the minimal energy geometry for each database entry I use, so it’s important for me to be sure that “{mol_id}/xyz” corresponds to the correct entry.