Best practices for heavy API usage?

rpw199912j · April 27, 2022, 7:03pm

Hi,

I am intending to screen through 1004 chemical systems containing aluminum (e.g., Al-Li-Zn, Al-Mg-Cu…), and then retrieve the ElasticityDocs for all the qhull_entries within each chemical system. I am planning to carry out this search with the new mp-api package.

I’ve sent a request to [email protected], but have not heard back yet.

Here is a code snippet that I intend to use. Could you please tell me the best practices and time to carry out this search? (e.g., pause time between each API call; aggregate search instead of for-loops; should I carry out the search during midnight? …)

import time
from mp_api import MPRester
from mp_api.core.client import MPRestError
from pymatgen.analysis.phase_diagram import PhaseDiagram

# read the API key from a txt file
with open("api_key.txt") as f:
    API_KEY = f.read()


def query_one_chemsys(chemsys: str) -> dict:
    """Helper function to carry out the query for one chemsys"""
    chemsys_lst = sorted(chemsys.split("-"))
    # get all ComputedStructureEntry within the given chemsys
    with MPRester(api_key=API_KEY) as mpr:
        phase_diagram_entries = mpr.get_entries_in_chemsys(chemsys_lst)

    # check if no entries has been returned
    if len(phase_diagram_entries) == 0:
        raise MPRestError("No valid entries")

    phase_diagram = PhaseDiagram(entries=phase_diagram_entries)
    # check if the dimensionality of the retrieved phase diagram matches the input chemsys
    if phase_diagram.dim != len(chemsys_lst):
        raise MPRestError("Entries dimensionality mismatch")

    # get the most stable phase entries
    most_stable_entries = phase_diagram.qhull_entries

    # get the mp-id for each entry in most_stable_entries
    mp_ids = [entry.entry_id for entry in most_stable_entries]

    # retrieve the elasticity doc for each mp-id
    material_elasticity_docs = []
    with MPRester(api_key=API_KEY) as mpr:
        for mp_id in mp_ids:
            try:
                material = mpr.elasticity.get_data_by_id(mp_id)
                material_elasticity_docs.append(material)
            except MPRestError:
                material_elasticity_docs.append(None)

    # store the elasticity data within each ComputedStructureEntry in a dictionary format
    mp_docs_to_store: list = []
    for entry, elasticity_doc in zip(most_stable_entries, material_elasticity_docs):
        entry_dict = entry.as_dict()
        entry_dict["data"]["elasticity"] = elasticity_doc.dict()["elasticity"] if elasticity_doc is not None else None
        mp_docs_to_store.append(entry_dict)
    return {f"{chemsys}": mp_docs_to_store}


chemsys_to_screen: list = ["Al-Li-Zn", "Al-Mg-Cu"]
screening_results = []
chemsys_with_errors = []
for chemsys in chemsys_to_screen:
    try:
        single_chemsys_result = query_one_chemsys(chemsys)
        screening_results.append(single_chemsys_result)
    except MPRestError:
        chemsys_with_errors.append(chemsys)
    time.sleep(10)

Best
Peiwen

munrojm · May 3, 2022, 4:58pm

Hi @rpw199912j

Sorry for the delayed response. You should be fine with this, don’t worry about trying to do anything special.

– Jason

rpw199912j · May 3, 2022, 6:15pm

@munrojm Thanks for the clarification!

Also, is there a way to suppress the progress bar that appears when retrieving ThermoDocs and ElasticityDocs?

Best
Peiwen

munrojm · May 3, 2022, 6:26pm

@rpw199912j, yes there is. Set the MPRESTER_MUTE_PROGRESS_BARS environment variable to True. That should mute all progress bars.

– Jason