Getting structure using material id in for loop not

Hi,

I am trying the get structures using MPRester()
I have around 4000 materials id I am interested and following is the code.

for mat_id in df['id_discharge']:
        structure = mpr.get_structure_by_material_id(mat_id, final=False)
        IStructure.add_oxidation_state_by_guess(structure)
        structures.append(structure)

Even after overnight running, there is no error and not finished. Is it supposed to be this long? or is there something wrong with the code?

Please help.
Thanks,

Hi @Chae-Ho_Yim,

For this number of materials, it would probably be easier to run:

with MPRester() as mpr:
    docs = mpr.query({'material_id': {'$in': all_mat_ids}, ['structure', 'material_id'])

Where all_mat_ids is a list of material ids of interest.

However, the line IStructure.add_oxidation_state_by_guess(structure) is probably what’s taking the most time, not the API retrieval.

Best,

Matt

Hi, Matt.

I tried as you recommended, but the docs is returning empty array. Would it be because there is no properties such as ‘structure’?

I was going through the MPRester().supported_properties and supported_task_properties , they do not include structure.

Many thanks.
Chae-Ho

Hi @Chae-Ho_Yim, “structure” is definitely a supported property, could you paste your code?

Thanks, figured out.

But I do have another problem.
I figured, MPRester also has option for Initial and Final structure to obtain.
Whatever option I use, I get 2900 structures, where the list of matID is 4400. When I was using the for loops I could obtain all 4400 structures.

Is there any reason why there is the difference. I have checked whether there was any duplicate in mat ID and there wasn’t.

Thanks.

This is difficult to diagnose without a code example, both methods should give equivalent results.

Thanks, Matt.

Following is the code.

df = pd.read_csv("MP_battery_info.csv")
df = df.rename({'id_discharge': 'material_id'}, axis=1)
all_mat_id = []
for matid in df['material_id']:
    all_mat_id.append(matid)
# properties = ['material_id', 'initial_structure', 'energy_per_atom', 'spacegroup', 'e_above_hull',
#               'band_gap']

properties = ['material_id', 'initial_structure']


with MPRester() as mpr:
    docs = mpr.query({"material_id": {"$in": all_mat_id}}, properties)

For the for loop:

mpr=MPRester()
structure=[]
for matid in df['material_id']:
    stru=mpr.get_structure_by_material_id(matid, final=False)
    structure.append(stru)

Please let me know if you need the csv file, the csv files are all the information from the battery id.

Thanks,

Looks like it should be fine, but why are you asking for the initial structure and not the final structure (e.g. just structure, or final=True)?

It was the only way to get structures, I believe some of the material id in the csv didn’t have the final structure, which caused error. Putting final=False was the only way I could run and get the information.

The only think I don’t understand is, when I use MPRester, putting properties with initial_structure, shouldn’t I get the same number as you said?
Thanks,

Try properties=["structure", "material_id"]