Dear MP community,
Do you have any recommendation for me for a database of molecules in a similar style to the Materials Project (i.e., RestAPI/Python interface and maybe a website where one can browse molecules by chemical formula).
I appreciate your suggestions and help!
Best,
Peter
Hi @peterschindler,
First off, we do have a molecules database as part of the Materials Project (https://materialsproject.org/#search/molecules). You can browse the Molecules Explorer by elements, formula, or Inchi, and the data should be accessible via our API/Python interface.
However, a variety of other molecular databases do exist. A good summary can be found on pages 10-11 of a recent review paper titled “Autonomous discovery in the chemical sciences part II: Outlook” (Angew. Chem. Int. Ed. 10.1002/anie.201909989). You will have to investigate further to determine which have an accompanying API / Python interface / website. If you do so, I would be interested to hear what you find.
Sincerely,
Sam
Hi @Sam_Blau,
Thanks for guiding me in the right direction! I will read up the references and report back if I find something that fits my requirements.
Also, do you know by any chance if there is a dedicated tag for the MAPI to search for molecules instead of crystals? I tried it the following way, but it doesn’t seem to return any molecules (same code with 'Structure'
instead of 'Molecule'
works for crystal structures though):
from pymatgen.ext.matproj import MPRester
with MPRester("...") as m:
molecules = m.query(criteria={'structure.@class': 'Molecule',
'nsites': {'$lt': 3},
},
properties=['nsites'])
Thanks again!
Best regards,
Peter
Hi @peterschindler,
This code snippet was sent to me by a user about a year ago who was successfully scraping molecule data from MP:
urlpattern = {
"results": "https://materialsproject.org/molecules/results?query={spec}",
"mol_json": "https://materialsproject.org/molecules/{mol_id}/json",
"mol_svg": "https://materialsproject.org/molecules/{mol_id}/svg",
"mol_xyz": "https://materialsproject.org/molecules/{mol_id}/xyz",
}
import json
import os
import sys
if sys.version_info[0] == 2:
from urllib import quote_plus
else:
from urllib.parse import quote_plus
import requests
MAPI_KEY = ________
def get_results(spec, fields=None):
"""Take a specification document (a `dict`), and return a list of matching molecules.
"""
# Stringify `spec`, ensure the string uses double quotes, and percent-encode it...
str_spec = quote_plus(str(spec).replace("'", '"'))
# ...because the spec is the value of a "query" key in the final URL.
url = urlpattern["results"].format(spec=str_spec)
return (requests.get(url, headers={'X-API-KEY': MAPI_KEY})).json()
results = get_results({})
where I’ve removed the user’s MAPI_KEY. I’m not sure if this is the “right” way to do it, but perhaps give it a try and see if it works? @shyamd @mkhorton @tschaume Perhaps one of you has a better answer here?
Sincerely,
Sam