An important data provenance check is that the composition of the ICSD compound should match the composition of the generated POSCAR.
Agreed, we could do this. In general we trust the ICSD, and there’s a limit to what we can do if we get bad CIFs, but it would be good to avoid calculating obviously bad materials if possible.
If the compositions don’t match (for example, ICSD_NaNH2 and MP_NaN), then these entries should definitely not be included on the MP database.
This is actually ok. The Materials Project is designed to be resilient against ‘bad’ structures. There are many unstable or metastable structures that are calculated, and the reported ‘energy above hull’ gives a measure of this stability. Ideally, we wouldn’t calculate obviously unphysical materials but sometimes we simply don’t know before performing the calculation whether the material is reasonable or not; if it’s not reasonable, it should have a large energy above hull reported. If you do a query on MP and order by energy above hull you’ll find a few of these very unstable materials.
We still retain these materials in our database because it can save other people from having to perform these calculations if we already know it results in an unstable structure, so this information is useful to report.
In this case, our hypothetical NaN mp-1080032 has an energy above hull of 0.174 eV/atom, which (though high) is actually not too bad, but if you look at the structure this is because there’s been significant structural relaxation compared to the structure reported from the ICSD.
We will have to discuss further internally how to address this however, because the reported tag “sodium amide” is incorrect, so trying to identify these faulty tags would be a good idea else it could be misleading. Thanks for the report!