Question about querying MP data set (JCESR molecules and Molecules explorer)

kimsmj986 · September 20, 2024, 8:01am

Hi all,
I am working on machine learning research using the Materials Project dataset, specifically focusing on organic electrolytes. I have two questions:

The molecular datasets are divided into JCESR and molecules, and I understand that data can be retrieved using the jcesr() method and the molecules.summary.search() method respectively. Could you clarify the difference between these two databases? Are both datasets suitable for battery electrolyte research?
When retrieving molecules using the molecules.summary.search() method, there aren’t many ways to impose custom constraints, and if too many molecules are retrieved, errors occur during the query process. Is there a way to resolve this issue?

I would greatly appreciate your assistance.

tschaume · September 20, 2024, 5:56pm

@kimsmj986 Both excellent questions!

@espottesmith is the expert and driving force behind our molecules dataset(s). He should be able to answer your questions as it pertains to using them for research.
We’re working on making the molecules dataset more performant for querying and downloading, and will open up more filters in the process. In the meantime, you could try retrieving the full dataset and save it to disk for post-filtering. See here and #4 here for a starting point.

HTH

espottesmith · September 20, 2024, 6:39pm

Hey @kimsmj986

The JCESR and molecules (MPcules) collections are quite different. In terms of scope, JCESR was developed to understand and design electrolytes. Other that molecular structures, the main properties that are reported are electrochemical (e.g., ionization energy). In contrast, the MPcules database is more general-purpose. In addition to electrochemical properties, we have vibrational, thermodynamic, electronic, and other properties. In general, the MPcules collection also uses higher levels of theory, though there’s not one single level of theory used for that collection.

kimsmj986 · September 26, 2024, 4:23am

@tschaume @espottesmith
Thank you!

kimsmj986 · September 26, 2024, 8:11am

Hi @espottesmith, I’m trying to get structure data(x,y,z positions of each atoms or pymatgen object).

So I used get_molecule_by_mpculeid function,

with MPRester(API_KEY, use_document_model=False) as mpr:
    result = mpr.molecules.get_molecule_by_mpculeid(
        mpcule_id='6166b5f8a8350c55f25459c82a4421e9-C3H4O3-0-1'
    )

with this code, but error occured.

mpcule_id is different with molecule_id?

And is there any way to get structure information of molecules in MP database?

I would greatly appreciate your assistance.

tschaume · September 27, 2024, 4:41pm

@kimsmj986 Could you share the error message you’re getting? Thanks!