Generating a random list of materials and a given property

vinven7 · April 18, 2024, 6:18pm

I’d like to generate a list of formulas from MP and their associated bandgaps. I am not specific about the material type or domain. I suggest need a substantial number of (material, property) pairs. What is the best way of doing this?

So let’s say that all I need is a list of 1000 records of the form [“material_id”,“formula_pretty”,"band_gap”])

Is there a way of doing this efficiently?

tschaume · May 10, 2024, 8:56pm

Efficient ways to go about it depend on how often you need to retrieve the (same or different) chunk of 1000 materials. See our docs for details on how to set up the mp-api python client. Also consult this page for tips and tricks to be aware of. If you only need a list of any 1000 materials once, you can use the num_chunks and chunk_size arguments:

from mp_api.client import MPRester

fields = ["material_id", "formula_pretty", "band_gap"]

with MPRester(APIKEY) as mpr:
    docs = mpr.materials.summary.search(
        fields=fields, chunk_size=1000, num_chunks=1
    )

Rerunning this code snippet will return the same 1000 materials unless a query is added. This quickly becomes inefficient. Assuming that you’d like to repeatedly generate random chunks of 1000 materials from MP, I’d suggest you retrieve the fields you need for all materials from MP once and save it to a local file (make sure to update the file when new MP data releases come out). You can subsequently reuse the file to generate a randomized list of 1000 materials as often as needed. For instance,

import orjson
import gzip

from mp_api.client import MPRester

fields = ["material_id", "formula_pretty", "band_gap"]

with MPRester(APIKEY, use_document_model=False, monty_decode=False) as mpr:
    docs = mpr.materials.summary.search(fields=fields)

option=orjson.OPT_NAIVE_UTC | orjson.OPT_SERIALIZE_NUMPY
dumped = orjson.dumps(docs, option=option)
fn = "mp_docs.json.gz"

with gzip.open(fn, 'wb') as f:
    f.write(dumped)

with gzip.open(fn, 'rb') as f:
    docs = orjson.loads(f.read())

# use the list of materials in `docs` to randomly select 1000

HTH