Recommendation for MP-like molecule database

Dear MP community,
Do you have any recommendation for me for a database of molecules in a similar style to the Materials Project (i.e., RestAPI/Python interface and maybe a website where one can browse molecules by chemical formula).
I appreciate your suggestions and help!
Best,
Peter

1 Like

Hi @peterschindler,

First off, we do have a molecules database as part of the Materials Project (https://materialsproject.org/#search/molecules). You can browse the Molecules Explorer by elements, formula, or Inchi, and the data should be accessible via our API/Python interface.

However, a variety of other molecular databases do exist. A good summary can be found on pages 10-11 of a recent review paper titled “Autonomous discovery in the chemical sciences part II: Outlook” (Angew. Chem. Int. Ed. 10.1002/anie.201909989). You will have to investigate further to determine which have an accompanying API / Python interface / website. If you do so, I would be interested to hear what you find.

Sincerely,
Sam

Hi @Sam_Blau,

Thanks for guiding me in the right direction! I will read up the references and report back if I find something that fits my requirements.

Also, do you know by any chance if there is a dedicated tag for the MAPI to search for molecules instead of crystals? I tried it the following way, but it doesn’t seem to return any molecules (same code with 'Structure' instead of 'Molecule' works for crystal structures though):

from pymatgen.ext.matproj import MPRester
with MPRester("...") as m:
    molecules = m.query(criteria={'structure.@class': 'Molecule',
                                  'nsites': {'$lt': 3},
                                  },
                                  properties=['nsites'])

Thanks again!

Best regards,
Peter

Hi @peterschindler,

This code snippet was sent to me by a user about a year ago who was successfully scraping molecule data from MP:

urlpattern = {
    "results": "https://materialsproject.org/molecules/results?query={spec}",
    "mol_json": "https://materialsproject.org/molecules/{mol_id}/json",
    "mol_svg": "https://materialsproject.org/molecules/{mol_id}/svg",
    "mol_xyz": "https://materialsproject.org/molecules/{mol_id}/xyz",
}

import json
import os
import sys
if sys.version_info[0] == 2:
    from urllib import quote_plus
else:
    from urllib.parse import quote_plus

import requests

MAPI_KEY = ________

def get_results(spec, fields=None):
    """Take a specification document (a `dict`), and return a list of matching molecules.
    """
    # Stringify `spec`, ensure the string uses double quotes, and percent-encode it...
    str_spec = quote_plus(str(spec).replace("'", '"'))
    # ...because the spec is the value of a "query" key in the final URL.
    url = urlpattern["results"].format(spec=str_spec)
    return (requests.get(url, headers={'X-API-KEY': MAPI_KEY})).json()

results = get_results({})

where I’ve removed the user’s MAPI_KEY. I’m not sure if this is the “right” way to do it, but perhaps give it a try and see if it works? @shyamd @mkhorton @tschaume Perhaps one of you has a better answer here?

Sincerely,
Sam

1 Like