Build GGA/GGA+U/R2SCAN (Mixed) phase diagrams via new API

peikai · January 15, 2023, 5:12pm

How to build GGA/GGA+U/R2SCAN (Mixed) phase diagrams in pymatgen with new API?

I tried to use MaterialsProjectDFTMixingScheme() to process the entries retrieved by chemical system via new API.

entries = MaterialsProjectDFTMixingScheme().process_entries(entries, clean=True)

But it failed since the approach requires that all entries have unique entry_ids. However, I noticed that there are multiple entries for one entry id, (with or without energy adjustment). Moreover, there are also multiple entries for one material id, returned by get_entry_by_material_id(), including different run types (GGA, R2SCAN, etc). So then, how to filter specific type entries in pymatgen to build GGA/GGA+U (Mixed), R2SCAN only and GGA/GGA+U/R2SCAN (Mixed) phase diagrams, like the material project webapp.

I think it might be helpful if there is a method in new API library that supports to retrieve only entries with GGA/GGA+U/R2SCAN Mixing Scheme Compatibility, similar to the function of parameter, compatible_only=True, in get_entries(), which return only MaterialsProject2020Compatibility entries (GGA/GGA+U Mixing Scheme).

Thanks in advance.

Kai

munrojm · January 18, 2023, 12:41am

Hi @peikai, the API client was just updated to ensure that there are no duplicate entry_ids for a specific thermo_type that is requested. It defaults to returning entries with corrections associated with the GGA/GGA+U mixing scheme that does not include any of the R2SCAN data.

To apply the mixing scheme yourself, you can get the uncorrected entries from the materials endpoint for each functional type and pass them to process_entries. Also, I will mention that all of our phase diagrams for the GGA/GGA+U, GGA/GGA+U/R2SCAN, and R2SCAN mixing schemes are available via the MPRester.thermo. get_phase_diagram_from_chemsys method.

– Jason

peikai · January 18, 2023, 9:40am

Hi @munrojm, thanks for the updates.

I retrieved entries to build phase diagram locally because I found that phase diagrams displayed by material project webapp are sometimes inconsistent with the self-built one before the new API and database are debuted. And I guess the webapp loads the pre-computed phase diagram, rather than always builds them by entries, right? Therefore, I think pre-computed phase diagrams are not always reliable, as they are possibly out-of-date, due to some updates like new entries, correction schemes, and database releases, etc.

entryList = mpr.get_entries_in_chemsys(‘Li-Rh-F’, additional_criteria={‘thermo_types’:[ThermoType.GGA_GGA_U_R2SCAN]})

Following your tips, I specified the thermo_type as GGA/GGA+U/R2SCAN, and got a list of entries. These entries have different tags on energy adjustments, including MP2020 anion correction, MP GGA(+U)/R2SCAN mixing adjustment, and None adjustment.

Should I confirm that the retrieved entries had been processed through the compatibility of GGA(+U)/R2SCAN mixing scheme? Or I should process them again. However, when I attempted to process them by myself:

entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList, clean=True)

The error raised as below. I’d like to know how to resolve it to proceed.

warnings.warn(
Retrieving ThermoDoc documents: 100%|█████████████████████████████████████████████████| 40/40 [00:00<?, ?it/s]
Processing 16 GGA(+U) and 24 R2SCAN entries…
D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\compatibility.py:1044: UserWarning: Failed to guess oxidation states for Entry mp-1185455-GGA (LiRh3). Assigning anion correction to only the most electronegative atom.
warnings.warn(
D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\compatibility.py:1044: UserWarning: Failed to guess oxidation states for Entry mp-1185348-GGA (LiF3). Assigning anion correction to only the most electronegative atom.
warnings.warn(
D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\compatibility.py:1044: UserWarning: Failed to guess oxidation states for Entry mp-974386-GGA (Rh3F). Assigning anion correction to only the most electronegative atom.
warnings.warn(
D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\compatibility.py:1044: UserWarning: Failed to guess oxidation states for Entry mp-1209757-GGA (RhF6). Assigning anion correction to only the most electronegative atom.
warnings.warn(
Processed 16 compatible GGA(+U) entries with MaterialsProject2020Compatibility
Entries belong to the {‘Li’, ‘F’, ‘Rh’} chemical system
Generating mixing state data from provided entries.
Traceback (most recent call last):
File “D:\Seafile\peikai\My Libraries\Repos\Volume-Planning-for-Anodes\DFTMixingScheme.py”, line 18, in
entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList, clean=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\mixing_scheme.py”, line 181, in process_entries
mixing_state_data = self.get_mixing_state_data(entries_type_1 + entries_type_2, verbose=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\mixing_scheme.py”, line 567, in get_mixing_state_data
for group in self.structure_matcher.group_structures(l_pregroup):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 848, in group_structures
inds = list(inds)
^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 845, in
lambda i: self.fit(refs, unmatched[i][1], skip_structure_reduction=True),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 609, in fit
match = self._match(struct1, struct2, fu, s1_supercell, break_on_match=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 709, in _match
return self._strict_match(
^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 760, in strict_match
if LinearAssignment(mask).min_cost > 0: # pylint: disable=E1101
^^^^^^^^^^^^^^^^^^^^^^
File “pymatgen\optimization\linear_assignment.pyx”, line 72, in pymatgen.optimization.linear_assignment.LinearAssignment.init
File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\numpy_init.py", line 284, in getattr
raise AttributeError("module {!r} has no attribute "
AttributeError: module ‘numpy’ has no attribute ‘int’

munrojm · January 18, 2023, 7:25pm

Hi @peikai, the phase diagrams shown on the website are constructed on the fly as well, and we are aware of the issue causing some of them to not appear correct. This will be fixed soon. The pre-built phase diagrams available from the API are different and should be correct.

That being said, you are doing everything right to get corrected entries locally. That issue is from a numpy update that happened recently. If you upgrade your pymatgen install, I believe it should work.

– Jason

peikai · January 20, 2023, 10:07am

Yes, the up-to-date version works. Thanks! @munrojm

When I tried to plot a pre-computed phase diagram to compare it with self-built one, an error was raised, about the elemental references in pre-computed data. I’ve opened a PR to contribute my updates on PhaseDiagram class to resolve it. However, it would better to fix it from the side of cloud data as well, once there is an opportunity to update the pre-computed phase diagrams.

github.com/materialsproject/pymatgen

KeyError of Elemental references in pre-computed phase diagram

materialsproject:master ← peikai:master

opened 02:30AM - 20 Jan 23 UTC

peikai

+627582 -1

## When I try to plot a pre-computed phase diagrams that are retrieved via g…et_phase_diagram_from_chemsys() method, > phaseDiagram = mpr.get_phase_diagram_from_chemsys('Li-Fe-O') > plotter = PDPlotter(phaseDiagram) a KeyError raises as shown below, which arises from the inconsistence on object stored in el_refs dictionary, i.e., the keys stored in phaseDiagram.el_refs are **str** object (_dict_keys(['O', 'Fe', 'Li'])_), instead of the **Element** object (_dict_keys([Element O, Element Fe, Element Li])_). The pre-computed phase diagrams in database might not be updated conveniently. So, I add some codes in pymatgen.analysis.phase_diagram.PhaseDiagram class, to update keys in elemental reference dictionary to Element object, in case it loads a pre-computed phase diagram. A corresponding test unit is also added to check the keys in el_refs dictionary, KeyError message: >Traceback (most recent call last): File "D:\Seafile\peikai\My Libraries\Repos\Volume-Planning-for-Anodes\phase_diagram_mixscheme.py", line 37, in <module> plotter = PDPlotter(phase_diagram, show_unstable=0, backend='matplotlib') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\phase_diagram.py", line 2111, in __init__ self._min_energy = min(self._pd.get_form_energy_per_atom(e) for e in self._pd.stable_entries) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\phase_diagram.py", line 2111, in <genexpr> self._min_energy = min(self._pd.get_form_energy_per_atom(e) for e in self._pd.stable_entries) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\phase_diagram.py", line 577, in get_form_energy_per_atom return self.get_form_energy(entry) / entry.composition.num_atoms ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\phase_diagram.py", line 564, in get_form_energy return entry.energy - sum(comp[el] * self.el_refs[el].energy_per_atom for el in comp.elements) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\phase_diagram.py", line 564, in <genexpr> return entry.energy - sum(comp[el] * self.el_refs[el].energy_per_atom for el in comp.elements) ~~~~~~~~~~~~^^^^ KeyError: Element Li

peikai · January 24, 2023, 2:35pm

Hi, @munrojm

I found there are some conflicts between pre-computed phase diagram and self-build one.

Na-Cl-O chemical system

pre-computed phase diagram:

self-built phase diagram:

Generated by entries of GGA/GGA+U/R2SCAN thermo-type:

Generated via local mixing processing:

Webapp phase diagram GGA/GGA+U/R2SCAN (Mixed):

Li-Sb-Se chemical system

pre-computed phase diagram:

self-built phase diagram:

Generated by entries of GGA/GGA+U/R2SCAN thermo-type:

Generated via local mixing processing:

Webapp phase diagram GGA/GGA+U/R2SCAN (Mixed):

Methods:

pre-computed phase diagrams are constructed from the entries that are directly retrieved from thermo endpoint.

phase_diagram = mpr.get_phase_diagram_from_chemsys(‘Li-Sb-Se’, thermo_type = ThermoType.GGA_GGA_U_R2SCAN)

Self-built phase diagrams are generated as below:

Generated by entries of GGA/GGA+U/R2SCAN thermo-type

entryList = mpr.get_entries_in_chemsys(‘Li-Be-O’, additional_criteria={‘thermo_types’:[ThermoType.GGA_GGA_U_R2SCAN]})
phase_diagram = PhaseDiagram(entryList)
Generated via local mixing processing

entryList1 = mpr.get_entries_in_chemsys(‘Na-Cl-O’, additional_criteria={‘thermo_types’:[ThermoType.R2SCAN]})
entryList2 = mpr.get_entries_in_chemsys(‘Na-Cl-O’, additional_criteria={‘thermo_types’:[ThermoType.GGA_GGA_U]})
entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList1+entryList2)
phase_diagram = PhaseDiagram(entryList)

I found that self-built phase diagrams via local mixing processing are the most reliable and consistent with the webapp. The pre-computed phase diagrams and those built without local mixing processing are different.

Kai Pei

peikai · January 24, 2023, 2:56pm

I submitted another PR to raise exception when PhaseDiagram class gets an empty entryList as input, which might occur in some mixing processes.

github.com/materialsproject/pymatgen

Raise ValueError when building phase diagram without entries.

materialsproject:master ← peikai:master

opened 03:18PM - 23 Jan 23 UTC

peikai

+8 -0

In the case that there are not enough entries to form a complete phase diagram, …the processing operation of mixing scheme would fail as expected. However, the invalid mixing operation process_entries() might return a none list, i.e., entryList = [], and then cause an error in PhaseDiagram class when computing data, as shown below. I committed some codes to raise a ValueError exception and hint a specific reason message for the attempt to build a phase diagram with no entries. A unittest was also added to check it is done properly. Run: > entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList, clean=True) > phase_diagram = PhaseDiagram(entryList) Errors: >Generating mixing state data from provided entries. D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\mixing_scheme.py:508: UserWarning: GGA(+U) entries do not form a complete PhaseDiagram. warnings.warn(f"{self.run_type_1} entries do not form a complete PhaseDiagram.") Entries contain R2SCAN calculations for 0 of 0 GGA(+U) hull entries. GGA(+U) energies will be adjusted to the R2SCAN scale D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\mixing_scheme.py:215: UserWarning: WARNING! GGA(+U) entries do not form a complete PhaseDiagram. No energy adjustments will be applied. warnings.warn(str(exc)) Processing complete. Mixed entries contain 0 GGA(+U) and 0 R2SCAN entries. >Traceback (most recent call last): phase_diagram = PhaseDiagram(entryList) ^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\phase_diagram.py", line 366, in __init__ computed_data = self._compute() ^^^^^^^^^^^^^^^ File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\phase_diagram.py", line 445, in _compute form_e = -np.dot(data, vec) ^^^^^^^^^^^^^^^^^ File "<__array_function__ internals>", line 180, in dot ValueError: shapes (0,) and (1,) not aligned: 0 (dim 0) != 1 (dim 0)

munrojm · January 24, 2023, 8:19pm

@peikai, it looks like your locally built phase diagrams are missing entries. The thermo API data for each material contains the canonical data used in the phase diagram for a given thermo_type (mixing scheme). In other words, a material might have a blessed GGA, GGA+U and R2SCAN calculation associated with it, but only one is chosen to be on the mixed phase diagram and included in entries in the thermo data document.

To compute the diagrams as our builders do, you should pull all of the uncorrected GGA, GGA+U, and R2SCAN ComputedStructureEntry data from the materials endpoint for every material within a specific chemical system and its subsystems (you can request the same entries field from mpr.materials.search), decorate it with oxidation state data from mpr.oxidation_states.search (we do that here emmet/thermo.py at 1a185027d017475e6112164df50428a0b06406c8 · materialsproject/emmet · GitHub), pass everything to the mixing scheme class as you have done, and then construct the phase diagram with the corrected entries.

– Jason

peikai · January 25, 2023, 9:17am

Hi @munrojm,

I have shown three routes to build phase diagrams. 1. is to retrieve pre-computed phase diagrams directly. 2. is to retrieve entries with specific thermo_type, to self-build. 3. is to retrieve all entries for mixing processing, to self-build. I think 1 and 2 approaches might somehow show wrong phase diagrams. and 3 is reliable.

What I have done in route 3 is exactly to merge all GGA/GGA_U/R2SCAN entries, then process them with mixing scheme locally, see codes below. The oxidation state data should have been contained in the entries. And the graphs look reliable. (They locate under the title 2. Generated via local mixing processing) Do you think it is the right way to reproduce phase diagrams? Thanks!

However, if we agree that the route 3 is the most reliable and right, thus the pre-computed phase diagrams (route 1) should be regarded with suspicion. See conflicts in Na-Cl-O chemical system phase diagrams (under the title of pre-computed phase diagram and 2. Generated via local mixing processing).

Moreover, as you said, entries with specific thermo_type (GGA_GGA_U_R2SCAN) have been mixing processed and screened. Hence, they should be able to utilize directly to build GGA/GGA+U/R2SCAN (mixed) phase diagram (route 1). But it is not the case. The graph (under the title 1. Generated by entries of GGA/GGA+U/R2SCAN thermo-type) still shows differently from pre-computed phase diagrams (under the title pre-computed phase diagram).

munrojm · January 25, 2023, 8:41pm

Ah, okay I see what you mean. I didn’t pay close enough attention to the way you were pulling entry data. I am going to take a closer look at this ASAP. Thank you for taking the time to post all of this information.

– Jason

munrojm · January 25, 2023, 11:58pm

@peikai, there is definitely an issue with the phase diagram part of our build pipeline. I am addressing this now, and the data fix should in soon.

– Jason

peikai · January 26, 2023, 10:00am

@munrojm, thanks a lot!

A typo:
The webapp phase diagram of Na-Cl-O chemical system should be the graph below, not what I attached. But what I expressed were not affected. It is still different from the pre-computed phase diagram, and the same as the phase diagram built in route 3.

Kai Pei

peikai · January 26, 2023, 10:52am

Papers of materials project that discuss the GGA/GGA+U phase diagrams have been well known. Well, is there any paper discussing the accuracy of GGA/GGA+U/R2SCAN mixing phase diagrams? Does the mixing operation make it superior to GGA/GGA+U phase diagrams?

Thanks!

Formation enthalpies by mixing GGA and GGA+U calculations, Phys. Rev. B 84, 045115
Li−Fe−P−O2 Phase Diagram from First Principles Calculations, Chem. Mater. 2008, 20, 5, 1798–1807

munrojm · January 29, 2023, 8:40am

@peikai, I have a fix for the data which should be up in the next couple days. Also, here is the publication associated with the GGA/GGA+U/R2SCAN mixing scheme: A flexible and scalable scheme for mixing computed formation energies from different levels of theory | npj Computational Materials

– Jason

peikai · January 29, 2023, 9:18am

@munrojm Thanks for timely updates!

By the way, I’m wondering the difference between thermo_id and entry_id in new API. They look similar, i.e., [MPID]_[thermo_type] and [MPID]-[run_type], respectively. Is themo_id the alias for entry_id in new API?

I’m going to set index for entries locally. The index should be unique and one to one correspondence with entries. However, since I’ve ever found that an entry_id could correspond to multiple entries before your fix it.

I’m confused what ID can be unique index for entries:

does an entry_id always corresponds to a unique entry?
does a thermo_id always corresponds to a unique entry? I noticed that thermo_id is unique for thermoDoc, but I’m not sure whether a thermoDoc always contain a single entry?

Thanks!
Kai Pei

munrojm · January 29, 2023, 7:10pm

@peikai no problem! I’ll ping you when the data is live.

This is actually a timely question. I believe we just merged in changes to the __eq__ method for Entry objects in pymatgen in response to a discussion around unique identifiers for them. The change alters equality evaluation to not just use the entry_id (MPID + run type), but also include the correction. entry_id + correction type and amount always identifies a unique entry.

That being said, the ThermoDoc data from the API only contains a single “blessed” entry for a specific material. This corresponds to the one ComputedStructureEntry object chosen by the mixing scheme that is reflected in the thermo_type. In principle, the thermo_type is probably enough to index the entry data, unless you are going to store multiple sets of entries processed by the same mixing scheme class.

– Jason

munrojm · February 7, 2023, 12:17am

@peikai, the data should be updated. Sorry for the late reply. Took longer than expected to make the fix and propagate it through our pipelines.

– Jason

peikai · February 19, 2023, 7:08am

@munrojm,
I’ve checked phase diagrams above; all aforementioned conflicts have been fixed. Thanks a lot.

And I’ve been experimenting ways of indexing entries, and those updates give me more options.

Ting_Wang · April 2, 2023, 8:33pm

Hi @munrojm , I tried to get only GGA and GGA_U data using the method discussed above

entries = mpr.get_entries_in_chemsys(elements,additional_criteria={‘thermo_types’:[ThermoType.GGA_GGA_U]})

The error is

name ‘ThermoType’ is not defined

. I haven’t found any information about ThermoType. Should I import this? Thank you so much!

munrojm · April 2, 2023, 9:21pm

See the code snippet on this page: https://docs.materialsproject.org/methodology/materials-methodology/thermodynamic-stability/phase-diagrams-pds

– Jason