Unique entries across all the databases

Hi all,

How many unique compounds can we get across all the databases in OPTIMADE?

You can find a list of all the databases that support, or are in the process of implementing, the OPTIMADE ApI at https://www.optimade.org/providers-dashboard/

The total number of entries in all the databases that support Optimade is about 16.5 million.
It is hard to say how many of these structures are unique. I would expect that there will be a considerable number of materials that have more than 1 entry. Different research groups can study the same material and each generate an entry for it. So I would expect that there are many entries across the different database for common materials.

1 Like

Thank you!

A unique compound is actually quite a vague term. If we speak about the unique materials phases, the following criteria exist to define them:

(a) unique composition +
(b) unique crystalline symmetry (i.e. space group) +
(c) unique number of atoms in the unit cell (i.e. the last part of the Pearson symbol).

On my estimate, the total number of the unique materials phases known experimentally is quite moderate, about 300k. However to confirm this number one needs to analyze all the Optimade structures for (a-c) which is not a minute task.

1 Like

@blokhin or @JPBergsma, any estimate of how many would contain CIFs?

I do not know which databases have CIF files, but the Pymatgen python package[https://pymatgen.org/] contains a function to convert an Optimade response to the CIF format.
There is a tutorial on how to do this in the Jupyter notebook found here:https://hub.gke2.mybinder.org/user/materials-conso-orial-exercises-v249pxuc/doc/tree/notebooks/demonstration-pymatgen-for-optimade-queries.ipynb

1 Like

The CIF is just one of the possible formats for the crystal structure, similar to e.g. a graphical picture can be in PNG, JPG, WEBP, GIF, etc. You can easily perform a conversion Optimade JSON <-> CIF. To the best of my knowledge, the native CIF support is provided by COD, MP, and MPDS, but they do not serve CIFs via the Optimade API, only via their legacy APIs. The main data format for Optimade is JSON.

1 Like