Traversing the Materials Project Database for Perovskite Compounds

I am interested in traversing across the entire Materials Project database to retrieve the Materials IDs for all compounds with a perovskite structure.

Our project involves studying perovskite materials and their properties, and we believe that the Materials Project database contains valuable data for our research. However, we have encountered challenges in efficiently identifying and accessing all the compounds with a perovskite structure in the new API.

We have explored various approaches using the Materials Project API, including querying specific data based on criteria and searching for perovskite-related information. However, since the perovskite structure is a complex property and not directly searchable as a simple field, we would greatly appreciate your guidance on how we can accomplish this task effectively.

This response by @munrojm to a similar question might help.

Thank you. I was able to retrieve 154,718 documents, but only 618 of them had ā€˜perovskitesā€™ in their tags or remarks. I believe there are more.

We employed another method using the following code:

ā€˜with MPRester(ā€œā€) as mpr:
docs = mpr.robocrys.search(keywords=[ā€œperovskiteā€])ā€™

This allowed us to pull 9,284 documents. However, most of the data seems outdated; they donā€™t exist in the database anymore

@Meeno15 could you clarify what you mean by outdated?

By ā€˜outdated,ā€™ I mean that many of the documents retrieved via the second method seem to be no longer current within the database. This could be due to various factors, including updates to research data, changes in document classification, or the removal of duplicates or inaccurate entries. Therefore, even though we initially retrieved 9,284 documents, it appears that a significant portion of these do not represent the most current and relevant data on ā€˜perovskites.ā€™ I hope this clarifies what I meant by ā€˜outdated.ā€™ Please let me know if you have any other questions.

I have just tested this and see what you mean. It looks as though some of the material_ids for the results have been fully deprecated. You can still access the data, but you have to pass in deprecated=True (e.g. docs = mpr.summary.search(task_ids=mpids, deprecated=True, fields=["material_id"])) with the resulting mpids list from robocrys.

ā€“ Jason

Using the robocrys endpoint with ā€˜perovskiteā€™, we found 9,284 materials, including 3,870 duplicates. The provenance endpoint yielded 618 unique materials. Merging both and removing duplicates, we found an extra 228 duplicates. Adding deprecated=True didnā€™t change the results.

We aim to get a comprehensive dataset of perovskite materials. Given the data discrepancy and duplicates, we wonder if thereā€™s more data we arenā€™t accessing, or if we should use a different retrieval method. Any guidance would be appreciated.

I have noted the duplicate data you are getting and will check on that. For a strict perovskite filter, I would recommend pulling all robocrys docs with the keyword search, extracting all unique MPIDs from the docs, and passing those to MPRester.summary.search(material_ids=list_of_extracted_ids) to get your data. The provenance tags arenā€™t fully updated at the moment. This will also ensure you get only non-deprecated data.

ā€“ Jason

1 Like

Thank you for suggesting the method of extracting unique MPIDs from the robocrys docs using the ā€˜perovskiteā€™ keyword, and then fetching summary data. Using this approach, we were able to extract 3,620 unique perovskite materials.
However, our project requires a larger dataset. Could you suggest any other methods or endpoints we could use to gather more perovskite data? Alternatively, do you have plans to update the provenance tags or add more perovskite entries to the database in the near future?
Thank you again for your continued assistance.

Unfortunately, we donā€™t currently have a more sophisticated way of tagging perovskite structures within the build pipeline. If you want to find others that we may miss, you will have to pull all material structures and come up with a metric of your own to check.

ā€“ Jason

Thread closed due to inactivity, please open a new thread to address related issues.