Energy above hull values in materials project

Dear Materials Project community,

How are the energy above the convex hull values calculated in Materials Project?

I was assuming it is calculated using the following procedure:
For a compound A, find all other compounds whose elements form a subset of A’s elements, then construct a phase diagram using pymatgen.CompoundPhaseDiagram and calculate ehull and decomposition using this function.
(Let’s call the values it returns PMG-ehull, in contrast to values retrieved from materials project, MP-ehull)

While usually PMG-ehull is consistent with MP-ehull, in a few (thousand?) cases they are inconsistent. Here is a notebook of my implementation.

Am I misunderstanding the calculation procedure?

Hi @qai222, welcome!

You need to use pymatgen.analysis.phase_diagram.PhaseDiagram not CompoundPhaseDiagram, the latter sets the compounds to zero formation enthalpy by definition, and the hull values for the latter are then quoted relative to the compounds. If the compounds are unstable/metastable, you will then get very different answers.

All Materials Project hulls are calculated using the pymatgen PhaseDiagram, so the results are 1-to-1 equivalent.

Hope this helps!

Matt

Thank you Matt for clarifying this. I think I actually used pymatgen.analysis.phase_diagram.PhaseDiagram in my notebook

from schema import Compound  # class for compounds in MP
from pymatgen.analysis.phase_diagram import PhaseDiagram
from pymatgen.core.composition import Composition
from pymatgen.entries.computed_entries import ComputedEntry

# wrapper for pymatgen EATCH calculator
def find_pmgehull(
        reactant: Compound, products: list[Compound],
):
    c_e = ComputedEntry(Composition(reactant.normalized_formula), reactant.formation_energy_per_atom,
                        entry_id=reactant.mpid)
    cp_es = []
    for cp in products:
        cp_es.append(ComputedEntry(Composition(cp.normalized_formula), cp.formation_energy_per_atom, entry_id=cp.mpid))
    pdEntries = [c_e, ] + cp_es
    pd = PhaseDiagram(pdEntries)
    decomp, ehull = pd.get_decomp_and_e_above_hull(c_e)
    return ehull

Using this find_pmgehull I was able to get consistent MP-ehull and PMG-ehull for ~121 K compounds in Materials Project, but they are inconsistent for the remaining ~5.9 K compounds. In my notebook, I showed how they can be made consistent by removing an entry from the phase diagram (for that particular case).

Could you share the code used in Materials Project for defining PhaseDiagram and for calculating ehull?

Could you share the code used in Materials Project for defining PhaseDiagram and for calculating ehull?

All the code for building the Materials Project database is public in our emmet code, see: emmet/thermo.py at 8462d3639740a6fa0ab2e089c8631e2326ade510 · materialsproject/emmet · GitHub

Be careful, if you’re constructing hulls yourself, not to include deprecated data.

Thank you very much for the pointer.

If you’re constructing hulls yourself, not to include deprecated data.

This is my current code to retrieve entries from MP

from pymatgen.ext.matproj import MPRester
mat_api_key = "MY-API-KEY"
mpr = MPRester(mat_api_key)
all_compounds = mpr.query({}, properties=["task_id", "pretty_formula", 'e_above_hull',
                                              'elements', 'volume', 'formation_energy_per_atom', 'band_gap',
                                              'nsites', 'unit_cell_formula', 'energy_per_atom'])

How can I retrieve entries from Materials Project without deprecated data?

FWIW Here is my attempt to reproduce e_above_hull values for the 463 compounds from LiFeSbO system using get_decomp_and_e_above_hull.

import pandas as pd
from pymatgen.analysis.phase_diagram import PhaseDiagram
from pymatgen.core.composition import Composition
from pymatgen.entries.computed_entries import ComputedEntry

LiFeSbO_df = pd.read_csv("LiFeSbO.csv")  # data retrieved using MPRester
print("# of compounds:", LiFeSbO_df.shape[0])

# load entries
def record_to_entry(record:dict, use_fe=True) -> ComputedEntry:
    if use_fe:
        energy=record["formation_energy_per_atom"]
    else:
        energy=record["energy_per_atom"]
    return ComputedEntry(composition=Composition(record["normalized_formula"]), energy=energy, entry_id=record["mpid"], data={"mp-ehull": record["e_above_hull"]} )

def compare(entries):
    diagram = PhaseDiagram(entries)
    print("mpid\tpmg-ehull\tmp-ehull")
    diff = []
    for entry in entries:
        decomp, pmg_ehull = diagram.get_decomp_and_e_above_hull(entry)
        mp_ehull = entry.data["mp-ehull"]
        if abs(pmg_ehull - mp_ehull) > 1e-1:
            print("{}\t{:.3f}\t{:.3f}".format(entry.entry_id, pmg_ehull, mp_ehull))
            diff.append(entry)
    print(len(diff))
    return diff

if __name__ == '__main__':

    entries_fe = [record_to_entry(record, use_fe=True) for record in LiFeSbO_df.to_dict(orient="records")]
    diff_fe = compare(entries_fe)

And I got this:

# of compounds: 463
mpid	pmg-ehull	mp-ehull
mp-1318211	10.402	0.044
mp-756153	3.468	0.055
mp-756791	9.646	0.055
mp-1177978	8.426	0.361
mp-1304822	10.408	0.050
mp-1638623	10.399	0.041
mp-765615	5.782	0.027
mp-771508	7.330	0.072
mp-1275686	10.400	0.042
mp-768602	6.560	0.040
mp-756107	6.450	0.008
mp-754807	6.555	0.002
mp-774920	6.553	0.000
mp-771712	6.444	0.000
mp-1177408	6.253	0.093
mp-1307244	10.400	0.042
mp-780303	8.648	0.015
mp-754805	10.440	0.081
mp-772131	9.171	0.164
mp-769870	6.587	0.046
mp-778012	0.000	0.432
mp-756320	8.631	0.000
mp-774583	6.447	0.002
mp-770854	4.774	0.073
mp-1295199	10.423	0.065
mp-1181437	1.546	1.053
mp-1062652	0.273	0.417
mvc-13234	0.275	1.011
mp-715572	0.656	0.116
mp-759037	0.071	0.350
mp-1188678	0.110	1.830
mp-715614	0.169	0.468
mp-1178247	0.145	0.470
mp-1194978	0.201	1.544
mp-557546	0.537	0.667
mp-612405	0.190	0.876
mp-716814	0.513	0.704
mp-1178239	0.148	0.360
mp-1078361	0.239	0.704
mvc-12905	0.200	0.354
mp-694910	0.146	0.784
mvc-5967	0.194	0.388
mp-1181766	0.184	0.475
mp-25236	0.276	0.613
mvc-11999	0.375	0.720
mp-1097003	0.277	1.147
mvc-13181	0.336	0.757
mp-863766	0.091	1.296
mp-756693	0.105	0.601
mvc-12125	0.312	0.828
mp-1181546	0.087	1.017
mp-850222	0.147	0.405
mp-1225001	0.188	0.446
mp-1182249	0.069	0.982
mp-850511	0.083	0.768
mp-780312	0.081	0.259
mp-1178108	0.233	0.349
mp-1176726	0.061	0.778
mp-780207	0.080	0.255
mp-758077	0.074	0.937
mp-1176945	0.072	0.557
mp-586092	0.060	0.484
mp-1177704	0.108	0.367
mp-762600	0.099	0.534
mp-780188	0.036	0.199
mp-752883	0.240	0.636
mp-759022	0.062	0.694
mp-759144	0.107	0.323
mp-758515	0.128	0.283
mp-752950	0.055	0.198
mp-758794	0.067	0.478
mp-754977	0.041	0.379
mp-752935	0.112	0.428
mp-760142	0.083	0.268
mp-771571	0.076	0.665
mp-770886	0.062	0.361
mp-780216	0.053	0.483
mp-1102442	0.958	0.740
mp-1194030	0.180	0.498
79

In addition to the inconsistency, I’m getting some crazy >10 eV/atom numbers, mainly because of mp-778012 which has a formation energy of -13.569 eV/atom. Is this one of the deprecated entries?

1 Like