Issues with reproducing e_form for sulphate-type compounds

I queried a lot of structures from MP on the 27 Apr 2021 for my PhD research. As part of my validation to ensure my calculations I am running were using compatible endpoints I wanted to check that I reproduced the data table e_form property released on the website/via the API. Looking at the energies carefully for 3222 there is a discrepancy between the values I get applying MaterialsProjectCompatibility to the queried ComputedStructureEntries and the value for e_form from the API.

Is it related to this fix fix anion correction when S and O are both anions · materialsproject/pymatgen@9a9e015 · GitHub / does the database build run on an older version of pymatgen?

I have attached a csv of the ids, compositions, mp_e_f and e_f I calculate.

mismatch_e_f.csv (41.6 KB)

Does the database build run on an older version of pymatgen?

I would have to check, but the database build runs on a fairly recent version of pymatgen, I believe the most recent one was on the most recent 2021.x version. Perhaps @rkingsbury (expert in all things correction related) could comment on this issue.

Hi @CompRhys , when you downloaded the data from MP on April 27, did you reprocess them yourself using MaterialsProjectCompatibility? The new MaterialsProjectCompatibility2020 scheme has become the default in pymatgen recently and is used to process entries in real time when you download them via the python MPRester(). However, they are not yet reflected in our database build (we have an imminent release that will change that).

So, the first thing I would do is manually reprocess your entries using MaterialsProjectCompatibility. Those should match what is shown on the website. Alternatively, you can inspect the .energy_adjustments attribute of your ComputedStructureEntry to see which set of corrections have been applied. All new corrections will have 'MP2020` in their description.

So initially I didn’t reprocess them and I got this histogram (blue is e_form from the API, orange is what I get from the PhaseDiagram with MP2020):


Then I reprocessed with MaterialsProjectCompatibility as I guessed that was the issue and got this which is matching for ~95% but had these sulphate issues for the entries above (the histogram isn’t a perfect match).


I dig some digging and couldn’t find where the discrepancy came from. We think that it might be due to the fix we found on GitHub but have not tested this - although the fix on GitHub is several years old.

My advisor suggested that we stick to the old compatibility until the new compatibility is published in a journal. This is why I wanted to try to understand this small discrepancy in e_form despite the imminent depreciation of MaterialsProjectCompatibility. There’s probably nothing to fix here if you’re updating the database with MP2020 soon.

OK thanks, this helps. I’m skeptical that the Github issue you found is related though, because that fix is several years old and should therefore be used when we build our database. So after reprocessing, the only mismatches are the 604 entries in your .csv file?

And thanks for your patience with the new corrections. We do hope to have a manuscript documenting them on preprint server within about 1 week.

1 Like

I think the 604 might just be the structures on the hull where I had the discrepancy as we wanted to get an idea of whether the discrepancy was affecting the stabilities. I thought I had attached the full list and now can’t find it but can regenerate it if helpful.

I look forward to reading it!

Hi @CompRhys , the manuscript is up:

Also if you haven’t seen it, we have now released database build using these new corrections. Release notes are here.

1 Like