Download Legacy MP database

Dear MP Team,

I was wondering if there is an easy way to download the legacy version of the MP database (mainly the structures), as the legacy API appears to be no longer functional.

The reason I’m asking is that many older papers utilize MPIDs that were based on the legacy API, and even some MPContribs databases refer to MPIDs from the legacy version (without storing the bulk structures in the MPContribs database itself).

As far as I can tell, many of them cannot be accessed with the new API, even when one sets deprecated=True (which yields about 10,000 deprecated MPIDs). As examples, the following IDs are no longer accessible (as far as I can tell): mvc-13350, mvc-16068, mvc-13133, mp-867303 (and I believe there are many more). They can still be accessed on the legacy website, but not with the new API (unless I am doing something wrong).

Thank you!

Best regards,
Peter

Hi @peterschindler, all of the older calculations are still accessible via the nextgen API, but many will only be accessible through the tasks endpoint. For example, the materials you listed are still available like so:

from mp_api.client import MPRester

with MPRester() as mpr:
    tasks = mpr.materials.tasks.search(task_ids=['mvc-13350', 'mvc-16068', 'mvc-13133', 'mp-867303'])

There is drift in which task is used to define the structure of a material (say we update a method used to calculate a property). So even for the legacy data, the structures referenced by a MPContribs entry will have drifted over time, and it won’t be easy to figure out what the original structure was.

I’d recommend against using the legacy API, but it is accessible and can be queried programmatically - the easiest way to do so would probably be:

  1. pip install pymatgen==2022.7.25 (or another similarly old version of pymatgen)
  2. Run something like this to obtain the legacy data:
from pymatgen.ext.matproj import MPRester as MPResterLegacy
from pymatgen.core import Structure

LEGACY_API_KEY = "" # should be a 19-character string.
query = {"formula": "FeO"}

with MPResterLegacy(LEGACY_API_KEY) as mpr:
    materials = mpr.query(query, ["material_id", "cif"], mp_decode=False)
structures_by_mpid = {
    doc["material_id"] : Structure.from_str(doc["cif"],fmt="cif")
    for doc in materials
}

There’s no guarantee this will work reliably. Searching the task collection using the nextgen API (mp_api) is preferred / supported

Hi @Aaron_Kaplan ,
Thanks for your help. That’s great - I’ll use the tasks endpoint then to retrieve the old structures.
Could you please explain in a bit more detail what you mean by “drift” in the structures? Would these be small changes in atomic positions and/or lattice parameters (and hence a potential change in the spacegroup symmetry), or could these entail larger deviations?

It might still be worth it to consider enabling users to download the entire database in the last version of the legacy API (which has been around for quite a while, and hence it’s more likely that derivative databases used this specific version of the MP). If I remember correctly, you do enable the download via AWS, but I am not sure if this specific (old) version can be downloaded as a whole (especially after September).

Thanks again!
-Peter

If we re-relax a structure and the changes are significant enough that it no longer has the same symmetry, then we would classify the resultant structure as a new material with a different material ID

A “material” in the materials project isn’t fixed by a set of constraints: ex. we don’t set mp-149 to be diamond cubic Si. mp-149 is defined by the set of all structures in MP which are symmetrically equivalent, and then taking the lowest identifier of the task IDs in each group

Hope that makes sense, happy to explain further if it’s not clear

For the legacy data: agreed that this should be something we allow for a bulk download of. I’ll send an update once that’s available!

Understood. That makes sense. Thanks again so much for the support and explanation!

-Peter