Build GGA/GGA+U/R2SCAN (Mixed) phase diagrams via new API

How to build GGA/GGA+U/R2SCAN (Mixed) phase diagrams in pymatgen with new API?

I tried to use MaterialsProjectDFTMixingScheme() to process the entries retrieved by chemical system via new API.

entries = MaterialsProjectDFTMixingScheme().process_entries(entries, clean=True)

But it failed since the approach requires that all entries have unique entry_ids. However, I noticed that there are multiple entries for one entry id, (with or without energy adjustment). Moreover, there are also multiple entries for one material id, returned by get_entry_by_material_id(), including different run types (GGA, R2SCAN, etc). So then, how to filter specific type entries in pymatgen to build GGA/GGA+U (Mixed), R2SCAN only and GGA/GGA+U/R2SCAN (Mixed) phase diagrams, like the material project webapp.

I think it might be helpful if there is a method in new API library that supports to retrieve only entries with GGA/GGA+U/R2SCAN Mixing Scheme Compatibility, similar to the function of parameter, compatible_only=True, in get_entries(), which return only MaterialsProject2020Compatibility entries (GGA/GGA+U Mixing Scheme).

Thanks in advance.

Kai

Hi @peikai, the API client was just updated to ensure that there are no duplicate entry_ids for a specific thermo_type that is requested. It defaults to returning entries with corrections associated with the GGA/GGA+U mixing scheme that does not include any of the R2SCAN data.

To apply the mixing scheme yourself, you can get the uncorrected entries from the materials endpoint for each functional type and pass them to process_entries. Also, I will mention that all of our phase diagrams for the GGA/GGA+U, GGA/GGA+U/R2SCAN, and R2SCAN mixing schemes are available via the MPRester.thermo. get_phase_diagram_from_chemsys method.

– Jason

Hi @munrojm, thanks for the updates.

I retrieved entries to build phase diagram locally because I found that phase diagrams displayed by material project webapp are sometimes inconsistent with the self-built one before the new API and database are debuted. And I guess the webapp loads the pre-computed phase diagram, rather than always builds them by entries, right? Therefore, I think pre-computed phase diagrams are not always reliable, as they are possibly out-of-date, due to some updates like new entries, correction schemes, and database releases, etc.

entryList = mpr.get_entries_in_chemsys(‘Li-Rh-F’, additional_criteria={‘thermo_types’:[ThermoType.GGA_GGA_U_R2SCAN]})

Following your tips, I specified the thermo_type as GGA/GGA+U/R2SCAN, and got a list of entries. These entries have different tags on energy adjustments, including MP2020 anion correction, MP GGA(+U)/R2SCAN mixing adjustment, and None adjustment.

Should I confirm that the retrieved entries had been processed through the compatibility of GGA(+U)/R2SCAN mixing scheme? Or I should process them again. However, when I attempted to process them by myself:

entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList, clean=True)

The error raised as below. I’d like to know how to resolve it to proceed.

warnings.warn(
Retrieving ThermoDoc documents: 100%|█████████████████████████████████████████████████| 40/40 [00:00<?, ?it/s]
Processing 16 GGA(+U) and 24 R2SCAN entries…
D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\compatibility.py:1044: UserWarning: Failed to guess oxidation states for Entry mp-1185455-GGA (LiRh3). Assigning anion correction to only the most electronegative atom.
warnings.warn(
D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\compatibility.py:1044: UserWarning: Failed to guess oxidation states for Entry mp-1185348-GGA (LiF3). Assigning anion correction to only the most electronegative atom.
warnings.warn(
D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\compatibility.py:1044: UserWarning: Failed to guess oxidation states for Entry mp-974386-GGA (Rh3F). Assigning anion correction to only the most electronegative atom.
warnings.warn(
D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\compatibility.py:1044: UserWarning: Failed to guess oxidation states for Entry mp-1209757-GGA (RhF6). Assigning anion correction to only the most electronegative atom.
warnings.warn(
Processed 16 compatible GGA(+U) entries with MaterialsProject2020Compatibility
Entries belong to the {‘Li’, ‘F’, ‘Rh’} chemical system
Generating mixing state data from provided entries.
Traceback (most recent call last):
File “D:\Seafile\peikai\My Libraries\Repos\Volume-Planning-for-Anodes\DFTMixingScheme.py”, line 18, in
entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList, clean=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\mixing_scheme.py”, line 181, in process_entries
mixing_state_data = self.get_mixing_state_data(entries_type_1 + entries_type_2, verbose=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\entries\mixing_scheme.py”, line 567, in get_mixing_state_data
for group in self.structure_matcher.group_structures(l_pregroup):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 848, in group_structures
inds = list(inds)
^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 845, in
lambda i: self.fit(refs, unmatched[i][1], skip_structure_reduction=True),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 609, in fit
match = self._match(struct1, struct2, fu, s1_supercell, break_on_match=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 709, in _match
return self._strict_match(
^^^^^^^^^^^^^^^^^^^
File “D:\Anaconda\envs\pymatgen2023\Lib\site-packages\pymatgen\analysis\structure_matcher.py”, line 760, in strict_match
if LinearAssignment(mask).min_cost > 0: # pylint: disable=E1101
^^^^^^^^^^^^^^^^^^^^^^
File “pymatgen\optimization\linear_assignment.pyx”, line 72, in pymatgen.optimization.linear_assignment.LinearAssignment.init
File "D:\Anaconda\envs\pymatgen2023\Lib\site-packages\numpy_init
.py", line 284, in getattr
raise AttributeError("module {!r} has no attribute "
AttributeError: module ‘numpy’ has no attribute ‘int’

Hi @peikai, the phase diagrams shown on the website are constructed on the fly as well, and we are aware of the issue causing some of them to not appear correct. This will be fixed soon. The pre-built phase diagrams available from the API are different and should be correct.

That being said, you are doing everything right to get corrected entries locally. That issue is from a numpy update that happened recently. If you upgrade your pymatgen install, I believe it should work.

– Jason

1 Like

Yes, the up-to-date version works. Thanks! @munrojm

When I tried to plot a pre-computed phase diagram to compare it with self-built one, an error was raised, about the elemental references in pre-computed data. I’ve opened a PR to contribute my updates on PhaseDiagram class to resolve it. However, it would better to fix it from the side of cloud data as well, once there is an opportunity to update the pre-computed phase diagrams.

1 Like

Hi, @munrojm

I found there are some conflicts between pre-computed phase diagram and self-build one.

Na-Cl-O chemical system

pre-computed phase diagram:

self-built phase diagram:

  1. Generated by entries of GGA/GGA+U/R2SCAN thermo-type:

  1. Generated via local mixing processing:

Webapp phase diagram GGA/GGA+U/R2SCAN (Mixed):


Li-Sb-Se chemical system

pre-computed phase diagram:

self-built phase diagram:

  1. Generated by entries of GGA/GGA+U/R2SCAN thermo-type:

  1. Generated via local mixing processing:

Webapp phase diagram GGA/GGA+U/R2SCAN (Mixed):


Methods:

pre-computed phase diagrams are constructed from the entries that are directly retrieved from thermo endpoint.

phase_diagram = mpr.get_phase_diagram_from_chemsys(‘Li-Sb-Se’, thermo_type = ThermoType.GGA_GGA_U_R2SCAN)

Self-built phase diagrams are generated as below:

  1. Generated by entries of GGA/GGA+U/R2SCAN thermo-type

    entryList = mpr.get_entries_in_chemsys(‘Li-Be-O’, additional_criteria={‘thermo_types’:[ThermoType.GGA_GGA_U_R2SCAN]})
    phase_diagram = PhaseDiagram(entryList)

  2. Generated via local mixing processing

    entryList1 = mpr.get_entries_in_chemsys(‘Na-Cl-O’, additional_criteria={‘thermo_types’:[ThermoType.R2SCAN]})
    entryList2 = mpr.get_entries_in_chemsys(‘Na-Cl-O’, additional_criteria={‘thermo_types’:[ThermoType.GGA_GGA_U]})
    entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList1+entryList2)
    phase_diagram = PhaseDiagram(entryList)


I found that self-built phase diagrams via local mixing processing are the most reliable and consistent with the webapp. The pre-computed phase diagrams and those built without local mixing processing are different.

Kai Pei

I submitted another PR to raise exception when PhaseDiagram class gets an empty entryList as input, which might occur in some mixing processes.

@peikai, it looks like your locally built phase diagrams are missing entries. The thermo API data for each material contains the canonical data used in the phase diagram for a given thermo_type (mixing scheme). In other words, a material might have a blessed GGA, GGA+U and R2SCAN calculation associated with it, but only one is chosen to be on the mixed phase diagram and included in entries in the thermo data document.

To compute the diagrams as our builders do, you should pull all of the uncorrected GGA, GGA+U, and R2SCAN ComputedStructureEntry data from the materials endpoint for every material within a specific chemical system and its subsystems (you can request the same entries field from mpr.materials.search), decorate it with oxidation state data from mpr.oxidation_states.search (we do that here emmet/thermo.py at 1a185027d017475e6112164df50428a0b06406c8 · materialsproject/emmet · GitHub), pass everything to the mixing scheme class as you have done, and then construct the phase diagram with the corrected entries.

– Jason

Hi @munrojm,

I have shown three routes to build phase diagrams. 1. is to retrieve pre-computed phase diagrams directly. 2. is to retrieve entries with specific thermo_type, to self-build. 3. is to retrieve all entries for mixing processing, to self-build. I think 1 and 2 approaches might somehow show wrong phase diagrams. and 3 is reliable.

What I have done in route 3 is exactly to merge all GGA/GGA_U/R2SCAN entries, then process them with mixing scheme locally, see codes below. The oxidation state data should have been contained in the entries. And the graphs look reliable. (They locate under the title 2. Generated via local mixing processing) Do you think it is the right way to reproduce phase diagrams? Thanks!

However, if we agree that the route 3 is the most reliable and right, thus the pre-computed phase diagrams (route 1) should be regarded with suspicion. See conflicts in Na-Cl-O chemical system phase diagrams (under the title of pre-computed phase diagram and 2. Generated via local mixing processing).

Moreover, as you said, entries with specific thermo_type (GGA_GGA_U_R2SCAN) have been mixing processed and screened. Hence, they should be able to utilize directly to build GGA/GGA+U/R2SCAN (mixed) phase diagram (route 1). But it is not the case. The graph (under the title 1. Generated by entries of GGA/GGA+U/R2SCAN thermo-type) still shows differently from pre-computed phase diagrams (under the title pre-computed phase diagram).

Ah, okay I see what you mean. I didn’t pay close enough attention to the way you were pulling entry data. I am going to take a closer look at this ASAP. Thank you for taking the time to post all of this information.

– Jason

@peikai, there is definitely an issue with the phase diagram part of our build pipeline. I am addressing this now, and the data fix should in soon.

– Jason

@munrojm, thanks a lot!

A typo:
The webapp phase diagram of Na-Cl-O chemical system should be the graph below, not what I attached. But what I expressed were not affected. It is still different from the pre-computed phase diagram, and the same as the phase diagram built in route 3.

Kai Pei

Papers of materials project that discuss the GGA/GGA+U phase diagrams have been well known. Well, is there any paper discussing the accuracy of GGA/GGA+U/R2SCAN mixing phase diagrams? Does the mixing operation make it superior to GGA/GGA+U phase diagrams?

Thanks!

  1. Formation enthalpies by mixing GGA and GGA+U calculations, Phys. Rev. B 84, 045115
  2. Li−Fe−P−O2 Phase Diagram from First Principles Calculations, Chem. Mater. 2008, 20, 5, 1798–1807

@peikai, I have a fix for the data which should be up in the next couple days. Also, here is the publication associated with the GGA/GGA+U/R2SCAN mixing scheme: A flexible and scalable scheme for mixing computed formation energies from different levels of theory | npj Computational Materials

– Jason

@munrojm Thanks for timely updates!

By the way, I’m wondering the difference between thermo_id and entry_id in new API. They look similar, i.e., [MPID]_[thermo_type] and [MPID]-[run_type], respectively. Is themo_id the alias for entry_id in new API?

I’m going to set index for entries locally. The index should be unique and one to one correspondence with entries. However, since I’ve ever found that an entry_id could correspond to multiple entries before your fix it.

I’m confused what ID can be unique index for entries:

  1. does an entry_id always corresponds to a unique entry?
  2. does a thermo_id always corresponds to a unique entry? I noticed that thermo_id is unique for thermoDoc, but I’m not sure whether a thermoDoc always contain a single entry?

Thanks!
Kai Pei

@peikai no problem! I’ll ping you when the data is live.

This is actually a timely question. I believe we just merged in changes to the __eq__ method for Entry objects in pymatgen in response to a discussion around unique identifiers for them. The change alters equality evaluation to not just use the entry_id (MPID + run type), but also include the correction. entry_id + correction type and amount always identifies a unique entry.

That being said, the ThermoDoc data from the API only contains a single “blessed” entry for a specific material. This corresponds to the one ComputedStructureEntry object chosen by the mixing scheme that is reflected in the thermo_type. In principle, the thermo_type is probably enough to index the entry data, unless you are going to store multiple sets of entries processed by the same mixing scheme class.

– Jason

@peikai, the data should be updated. Sorry for the late reply. Took longer than expected to make the fix and propagate it through our pipelines.

– Jason