Save the SummaryDoc as a json File

mprester_time · November 12, 2022, 10:32pm

Hi everyone,
I am inexperienced with API’s so maybe this has a really simple solution.
I was trying to save the data that I get through the summary.search() function in json format. I am trying to do this because I wish to pass on the summary data into a Matlab script I am working on, and thought using json would be the easiest way to do so. I am open to trying out other formats if they’re more helpful.

I use the following code

list_of_properties=["energy_above_hull",'band_gap','decomposes_to','uncorrected_energy_per_atom','formation_energy_per_atom','material_id','nelements',"density","elements","formula",'is_stable',"uncorrected_energy",'composition','composition_reduced','structure']

#thenext line basically exists so that i can enter a material_id in the console when i run this python script

mat_id=sys.argv.pop()
with MPRester(MP_API_KEY) as mpr:
    Preformatted textyour_material= mpr.summary.search(material_ids=[mat_id],fields=list_of_properties,all_fields=False)[0]
your_material.json()

and get the following error message:
TypeError: Object of type ‘Composition’ is not JSON serializable

I have tried other ways of achieving this , for example:

material_summary=your_material.dict()
json.dumps(material_summary)

which causes the error
TypeError: Object of type Element is not JSON serializable

so if anyone has any tips or suggestion for me I would greatly appreciate it!

mkhorton · November 14, 2022, 7:44pm

Hi @mprester_time,

You can use monty (which is already installed if you are using the MPRester):

from monty.serialization import loadfn, dumpfn

dumpfn(your_object, "your_filename.json") # or .json.gz

This uses the MontyEncoder and MontyDecoder to automatically convert the relevant objects, like Composition, to JSON.

Hope this helps!

Matt

mdigennaro · August 23, 2023, 2:34pm

Hello @mkhorton, I followed your answer and I get the documents saved as dict, rather than pydantic.main.MPDataDoc. Once that is done, I am not able to retrieve the Molecule object from the saved dict. The code is below. Thank you

from monty.serialization import loadfn, dumpfn
from mp_api.client import MPRester
api_key = '***'

import os
json_file = 'data.json'
os.remove(json_file)

with MPRester(api_key) as mpr:
    data = mpr.molecules.search( 
            chemsys = 'O',
            fiaelds=['composition']
    )

dumpfn(data, json_file)
read_data = loadfn(json_file)

print(type(data[0]))
print(type(read_data[0]))

print(data[0])
print(read_data[0].keys())

# Read Molecule
from pymatgen.core.structure import Molecule
Molecule.from_dict(read_data[0])

Retrieving MoleculeDoc documents: 100%
13/13 [00:00<00:00, 976.08it/s]
<class 'pydantic.main.MPDataDoc'>
<class 'dict'>
MPDataDoc<MoleculeDoc>
composition=Composition('O1')

Fields not requested:
['builder_meta', 'charge', 'spin_multiplicity', 'natoms', 'elements', 'nelements', 'nelectrons', 'composition_reduced', 'formula_alphabetical', 'formula_pretty', 'formula_anonymous', 'chemsys', 'symmetry', 'molecule_id', 'molecule', 'deprecated', 'deprecation_reasons', 'initial_molecules', 'task_ids', 'deprecated_tasks', 'calc_types', 'last_updated', 'created_at', 'origins', 'warnings', 'species', 'molecules', 'molecule_levels_of_theory', 'species_hash', 'coord_hash', 'inchi', 'inchi_key', 'task_types', 'levels_of_theory', 'solvents', 'lot_solvents', 'unique_calc_types', 'unique_task_types', 'unique_levels_of_theory', 'unique_solvents', 'unique_lot_solvents', 'entries', 'best_entries', 'constituent_molecules', 'similar_molecules']
dict_keys(['composition', 'fields_not_requested', '@module', '@class', '@version'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[18], line 26
     24 # Read Molecule
     25 from pymatgen.core.structure import Molecule
---> 26 Molecule.from_dict(read_data[0])

File ~/miniconda3/envs/kubas/lib/python3.10/site-packages/pymatgen/core/structure.py:3221, in IMolecule.from_dict(cls, d)
   3209 @classmethod
   3210 def from_dict(cls, d) -> IMolecule | Molecule:
   3211     """
   3212     Reconstitute a Molecule object from a dict representation created using
   3213     as_dict().
   (...)
   3219         Molecule object
   3220     """
-> 3221     sites = [Site.from_dict(sd) for sd in d["sites"]]
   3222     charge = d.get("charge", 0)
   3223     spin_multiplicity = d.get("spin_multiplicity")

KeyError: 'sites'

tschaume · August 23, 2023, 5:22pm

There’s a typo in your arguments to mpr.molecules.search: fiaelds should be fields. HTH

mdigennaro · August 24, 2023, 8:34am

True, but that is not the error. If I fix that the KeyError: 'sites' remains.

tschaume · August 24, 2023, 8:54pm

I think you might have to upgrade pymatgen. @janosh might be able to help and provide details for this error.

janosh · August 24, 2023, 9:05pm

@mdigennaro There are several issues here:

the chemsys keyword was dropped. you probably want to use elements instead
you’re probably using an outdated API client. run pip install -U mp-api to update
you can’t create Molecules from the returned data since you’re only requesting the composition, hence the KeyError: 'sites'

This is probably what you want to do

from mp_api.client import MPRester

with MPRester() as mpr:
    data = mpr.molecules.search(elements="O", num_chunks=1, chunk_size=1)

print(data[0].molecule)

You can call Molecule.as_dict() or Molecule.to('file') directly on the returned objects.

mdigennaro · August 28, 2023, 1:28pm

Hello @janosh and thanks for the support,
I have cloned the git repository from GitHub - materialsproject/api: New API client for the Materials Project a few days ago.

The query I am trying to do is for all binary hydrates (such as C-H systems).
I find the chemsys is very convenient here, since I can query ‘*-H’.
How can I formulate the query withouth chemsys?

Regarding the fields, tried downloading all the fields, but if I save it to a json and reload, I get a dictionary:

import os
from monty.serialization import loadfn, dumpfn
chemsys = 'Li-H'
json_file = f"{chemsys}.json"

os.remove(json_file)

with MPRester(api_key) as mpr:
    data = mpr.molecules.search( chemsys = chemsys)

    dumpfn(data, json_file)

data = loadfn(json_file)
data[0].molecule

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[37], line 2
      1 data = loadfn(json_file)
----> 2 data[0].molecule

AttributeError: 'dict' object has no attribute 'molecule'

BR
Marco

janosh · August 28, 2023, 3:38pm

Use

Molecule.from_dict(data[0]["molecule"])

I’m not sure why the chemsys keyword was removed. Maybe @munrojm can comment.

tsmathis · February 22, 2024, 11:32pm

Thread closed due to inactivity, please open a new thread to address related issues.