Energy above hull values in materials project

Dear Materials Project community,

How are the energy above the convex hull values calculated in Materials Project?

I was assuming it is calculated using the following procedure:
For a compound A, find all other compounds whose elements form a subset of A’s elements, then construct a phase diagram using pymatgen.CompoundPhaseDiagram and calculate ehull and decomposition using this function.
(Let’s call the values it returns PMG-ehull, in contrast to values retrieved from materials project, MP-ehull)

While usually PMG-ehull is consistent with MP-ehull, in a few (thousand?) cases they are inconsistent. Here is a notebook of my implementation.

Am I misunderstanding the calculation procedure?

Hi @qai222, welcome!

You need to use pymatgen.analysis.phase_diagram.PhaseDiagram not CompoundPhaseDiagram, the latter sets the compounds to zero formation enthalpy by definition, and the hull values for the latter are then quoted relative to the compounds. If the compounds are unstable/metastable, you will then get very different answers.

All Materials Project hulls are calculated using the pymatgen PhaseDiagram, so the results are 1-to-1 equivalent.

Hope this helps!

Matt

Thank you Matt for clarifying this. I think I actually used pymatgen.analysis.phase_diagram.PhaseDiagram in my notebook

from schema import Compound  # class for compounds in MP
from pymatgen.analysis.phase_diagram import PhaseDiagram
from pymatgen.core.composition import Composition
from pymatgen.entries.computed_entries import ComputedEntry

# wrapper for pymatgen EATCH calculator
def find_pmgehull(
        reactant: Compound, products: list[Compound],
):
    c_e = ComputedEntry(Composition(reactant.normalized_formula), reactant.formation_energy_per_atom,
                        entry_id=reactant.mpid)
    cp_es = []
    for cp in products:
        cp_es.append(ComputedEntry(Composition(cp.normalized_formula), cp.formation_energy_per_atom, entry_id=cp.mpid))
    pdEntries = [c_e, ] + cp_es
    pd = PhaseDiagram(pdEntries)
    decomp, ehull = pd.get_decomp_and_e_above_hull(c_e)
    return ehull

Using this find_pmgehull I was able to get consistent MP-ehull and PMG-ehull for ~121 K compounds in Materials Project, but they are inconsistent for the remaining ~5.9 K compounds. In my notebook, I showed how they can be made consistent by removing an entry from the phase diagram (for that particular case).

Could you share the code used in Materials Project for defining PhaseDiagram and for calculating ehull?

Could you share the code used in Materials Project for defining PhaseDiagram and for calculating ehull?

All the code for building the Materials Project database is public in our emmet code, see: emmet/emmet-core/emmet/core/thermo.py at 8462d3639740a6fa0ab2e089c8631e2326ade510 · materialsproject/emmet · GitHub

Be careful, if you’re constructing hulls yourself, not to include deprecated data.

Thank you very much for the pointer.

If you’re constructing hulls yourself, not to include deprecated data.

This is my current code to retrieve entries from MP

from pymatgen.ext.matproj import MPRester
mat_api_key = "MY-API-KEY"
mpr = MPRester(mat_api_key)
all_compounds = mpr.query({}, properties=["task_id", "pretty_formula", 'e_above_hull',
                                              'elements', 'volume', 'formation_energy_per_atom', 'band_gap',
                                              'nsites', 'unit_cell_formula', 'energy_per_atom'])

How can I retrieve entries from Materials Project without deprecated data?

FWIW Here is my attempt to reproduce e_above_hull values for the 463 compounds from LiFeSbO system using get_decomp_and_e_above_hull.

import pandas as pd
from pymatgen.analysis.phase_diagram import PhaseDiagram
from pymatgen.core.composition import Composition
from pymatgen.entries.computed_entries import ComputedEntry

LiFeSbO_df = pd.read_csv("LiFeSbO.csv")  # data retrieved using MPRester
print("# of compounds:", LiFeSbO_df.shape[0])

# load entries
def record_to_entry(record:dict, use_fe=True) -> ComputedEntry:
    if use_fe:
        energy=record["formation_energy_per_atom"]
    else:
        energy=record["energy_per_atom"]
    return ComputedEntry(composition=Composition(record["normalized_formula"]), energy=energy, entry_id=record["mpid"], data={"mp-ehull": record["e_above_hull"]} )

def compare(entries):
    diagram = PhaseDiagram(entries)
    print("mpid\tpmg-ehull\tmp-ehull")
    diff = []
    for entry in entries:
        decomp, pmg_ehull = diagram.get_decomp_and_e_above_hull(entry)
        mp_ehull = entry.data["mp-ehull"]
        if abs(pmg_ehull - mp_ehull) > 1e-1:
            print("{}\t{:.3f}\t{:.3f}".format(entry.entry_id, pmg_ehull, mp_ehull))
            diff.append(entry)
    print(len(diff))
    return diff

if __name__ == '__main__':

    entries_fe = [record_to_entry(record, use_fe=True) for record in LiFeSbO_df.to_dict(orient="records")]
    diff_fe = compare(entries_fe)

And I got this:

# of compounds: 463
mpid	pmg-ehull	mp-ehull
mp-1318211	10.402	0.044
mp-756153	3.468	0.055
mp-756791	9.646	0.055
mp-1177978	8.426	0.361
mp-1304822	10.408	0.050
mp-1638623	10.399	0.041
mp-765615	5.782	0.027
mp-771508	7.330	0.072
mp-1275686	10.400	0.042
mp-768602	6.560	0.040
mp-756107	6.450	0.008
mp-754807	6.555	0.002
mp-774920	6.553	0.000
mp-771712	6.444	0.000
mp-1177408	6.253	0.093
mp-1307244	10.400	0.042
mp-780303	8.648	0.015
mp-754805	10.440	0.081
mp-772131	9.171	0.164
mp-769870	6.587	0.046
mp-778012	0.000	0.432
mp-756320	8.631	0.000
mp-774583	6.447	0.002
mp-770854	4.774	0.073
mp-1295199	10.423	0.065
mp-1181437	1.546	1.053
mp-1062652	0.273	0.417
mvc-13234	0.275	1.011
mp-715572	0.656	0.116
mp-759037	0.071	0.350
mp-1188678	0.110	1.830
mp-715614	0.169	0.468
mp-1178247	0.145	0.470
mp-1194978	0.201	1.544
mp-557546	0.537	0.667
mp-612405	0.190	0.876
mp-716814	0.513	0.704
mp-1178239	0.148	0.360
mp-1078361	0.239	0.704
mvc-12905	0.200	0.354
mp-694910	0.146	0.784
mvc-5967	0.194	0.388
mp-1181766	0.184	0.475
mp-25236	0.276	0.613
mvc-11999	0.375	0.720
mp-1097003	0.277	1.147
mvc-13181	0.336	0.757
mp-863766	0.091	1.296
mp-756693	0.105	0.601
mvc-12125	0.312	0.828
mp-1181546	0.087	1.017
mp-850222	0.147	0.405
mp-1225001	0.188	0.446
mp-1182249	0.069	0.982
mp-850511	0.083	0.768
mp-780312	0.081	0.259
mp-1178108	0.233	0.349
mp-1176726	0.061	0.778
mp-780207	0.080	0.255
mp-758077	0.074	0.937
mp-1176945	0.072	0.557
mp-586092	0.060	0.484
mp-1177704	0.108	0.367
mp-762600	0.099	0.534
mp-780188	0.036	0.199
mp-752883	0.240	0.636
mp-759022	0.062	0.694
mp-759144	0.107	0.323
mp-758515	0.128	0.283
mp-752950	0.055	0.198
mp-758794	0.067	0.478
mp-754977	0.041	0.379
mp-752935	0.112	0.428
mp-760142	0.083	0.268
mp-771571	0.076	0.665
mp-770886	0.062	0.361
mp-780216	0.053	0.483
mp-1102442	0.958	0.740
mp-1194030	0.180	0.498
79

In addition to the inconsistency, I’m getting some crazy >10 eV/atom numbers, mainly because of mp-778012 which has a formation energy of -13.569 eV/atom. Is this one of the deprecated entries?

1 Like

Hi,
How to find the energy above hull for the random compound that is not existed in the Materials Project ? I have a DFT analysis results of those random compounds.

It’s done by PhaseDiagram.get_decomp_and_e_above_hull method. You need to create ComputedEntry from your calculations, just like I did in the previous post.

Hello,

Apologies for bumping an old post, but I’m interested in knowing how to calculate the e_above_hull using my own DFT-calculated data.

Below is the code I used, and following that, the error message that I received.

from pymatgen.ext.matproj import MPRester
from pymatgen.analysis.phase_diagram import PhaseDiagram, PDPlotter, CompoundPhaseDiagram
from pymatgen.entries.computed_entries import ComputedEntry
from pymatgen.core import Composition
# from pymatgen.analysis.pourbaix_diagram import PourbaixDiagram, PourbaixEntry, PourbaixAnalyzer # StableEntry
# from pymatgen.analysis.phase_diagram import PhaseDiagram, PDAnalyzer, ConvexHullAnalyzer
# from pymatgen.analysis.pour import ConvexHullAnalyzer

# Assuming you have a list of computed entries (e.g., from a database)
entries = [
    ComputedEntry(Composition("Y4N4"), -164.774553),
    ComputedEntry(Composition("Y16Ti16O56"), -1377.12118691),
    ComputedEntry(Composition("Y8Ti4O20"), -532.80516856),
    ComputedEntry(Composition("Y4Ti4O8N4"), -322.86938277),
    ComputedEntry(Composition("Y2"), -55.73665798),
    ComputedEntry(Composition("N8"), -74.18410792),
    ComputedEntry(Composition("O6"), -36.28556130),
    ComputedEntry(Composition("Ti3"), -48.18069328),

] # List of entries with structure and energy

pd = PhaseDiagram(entries)

# Create a ConvexHullAnalyzer
# hull_analyzer = ConvexHullAnalyzer(entries)

# For a specific compound, e.g., La2Ti2O7
compound = Composition("Y4Ti4O8N4")
energy_above_hull = pd.get_e_above_hull(compound)

print(f"Energy above hull for {compound}: {energy_above_hull:.4f} eV")

And this is the error that I get:

/usr/local/lib/python3.10/dist-packages/pymatgen/analysis/phase_diagram.py in get_decomp_and_e_above_hull(self, entry, allow_negative, check_stable, on_error)
    758         except Exception as exc:
    759             if on_error == "raise":
--> 760                 raise ValueError(f"Unable to get decomposition for {entry}") from exc
    761             if on_error == "warn":
    762                 warnings.warn(f"Unable to get decomposition for {entry}, encountered {exc}")

ValueError: Unable to get decomposition for Y4 Ti4 O8 N4

I have several questions regarding this code and the science/equations behind formation energy, decomposition energy and energy above hull. I am inexperienced in both Python codes and materials thermodynamic stabilities, so forgive me if the questions are somewhat vague or confusing.

  1. Is there a way I can modify this code to make this work? Should I include more reference structures? What other calculations am I missing in order to calculate the energy above hull?

  2. What does the equation of energy above hull looks like, and how does it differ with the equation of formation energy?

Thank you for your time!

Edit 1:
I managed to fix the error by modifying a certain part of the code into this:

# For a specific compound, e.g., La2Ti2O7
compound = ComputedEntry(Composition("Y4Ti4O8N4"), -322.86938277) 

But now I get this input, and I am unsure on how to interpret it. Is it saying that the material is on the convex hull, or am I using the wrong reference materials which results into this?

Available compositions in entries:
Y4 N4
Y16 Ti16 O56
Y8 Ti4 O20
Y4 Ti4 O8 N4
Y2
N8
O6
Ti3
Energy above hull for None ComputedEntry - Y4 Ti4 N4 O8 (YTiNO2)
Energy (Uncorrected)     = -322.8694 eV (-16.1435 eV/atom)
Correction               = 0.0000    eV (0.0000   eV/atom)
Energy (Final)           = -322.8694 eV (-16.1435 eV/atom)
Energy Adjustments:
  None
Parameters:
Data:: 0.0000 eV