Saving MPDoc offline

Hello!

I am running into a problem with trying to save MPDocs in a format that can be accessed as an MPDoc when opened offline.

I have seen this discussion before but none of the methods that were mentioned have successfully worked for me.

Here is my code to gather all available MPDocs (with my api key):

from mp_api.client import MPRester
mpr = MPRester(api_key)
docs = mpr.materials.summary.search()

I successfully gather the docs and try to dump them by:

from monty.serialization import dumpfn, loadfn
from emmet.core.utils import jsanitize
sanitized_docs = jsanitize(docs)
dumpfn(sanitized_docs, "mp_docs.json.gz")

This command goes through, but when I try to load the file I get the error:

loaded_docs = loadfn("mp_docs.json.gz")

File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/init.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
274 def load(fp, *, cls=None, object_hook=None, parse_float=None,
275 parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
276 “”“Deserialize fp (a .read()-supporting file-like object containing
277 a JSON document) to a Python object.
278
(…)
291 kwarg; otherwise JSONDecoder is used.
292 “””
→ 293 return loads(fp.read(),
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

563 hex(self._crc)))
564 elif isize != (self._stream_size & 0xffffffff):
565 raise BadGzipFile(“Incorrect length of data produced”)

BadGzipFile: CRC check failed 0xa0452f91 != 0x3e6a3d16

Any ideas on how to save the MPDocs locally to be able to retrieve them as MPDocs offline? Thanks!

Thanks for reaching out! Were you able to resolve this?

While trying to reproduce this, I unfortunately ran into a pydantic validation error and the jsanitize/dumpfn cycle was very slow. @munrojm and I will have to take a closer look at that.

Disabling document models and monty decoding in MPRester, and going through orjson for serialization worked well for me, though:

import orjson
import gzip

from mp_api.client import MPRester

with MPRester("api-key", use_document_model=False, monty_decode=False) as mpr:
    docs = mpr.materials.summary.search()

option = orjson.OPT_NAIVE_UTC | orjson.OPT_SERIALIZE_NUMPY
dumped = orjson.dumps(docs, option=option)

fn = "mp_docs.json.gz"
with gzip.open(fn, 'wb') as f:
    f.write(dumped)

with gzip.open(fn,'rb') as f:
    docs = orjson.loads(f.read())

HTH