Search by Date Added

I would like to be able to use the Explorer to search structures by date-added. Specifically so that I can know what new structures have been added recently.

Thanks for reaching out! That’s a good feature request and has been on our minds for a while.

As part of modernizing our entire data processing pipeline, we’re publishing an increasing selection of MP data products in our AWS OpenData repositories. While only the latest version of our build data is available through the API, the “build” repository contains the underlying data for every (recent) release in file format. Specifically, each build collection contains a manifest file that can be used to figure out a list of new materials between releases.

So, at least for a programmatic approach, @tsmathis can provide a solution today. Once our technology choices have settled, we’ll integrate the approach into the mp-api client and subsequently the website. HTH

@shyuep , using the manifests that @tschaume mentioned will get you the differences (at least in terms of new entry ids) between data releases in just a couple lines of code.

For new, non-deprecated materials in the current release (v2025.04.10) vs. the previous release (v2025.02-12.post1):

import pandas as pd

current_data_release = (
    "s3://materialsproject-build/collections/2025-04-10/summary/manifest.jsonl.gz"
)
previous_data_release = (
    "s3://materialsproject-build/collections/2025-02-12-post1/summary/manifest.jsonl.gz"
)

current_manifest = pd.read_json(current_data_release, lines=True, compression="gzip")
previous_manifest = pd.read_json(previous_data_release, lines=True, compression="gzip")

new_material_ids = set(
    current_manifest.loc[current_manifest.deprecated == False].material_id
) - set(previous_manifest.loc[previous_manifest.deprecated == False].material_id)

I believe you will need s3fs installed for pandas to auto handle the "s3://..." paths.

From there you can use the material ids with the mp_api client to programmatically get the relevant materials. The same can be done for each collection for all data releases back to v2024.11.14

1 Like