Discrepancy between oxidation states obtained through bond_valence in pyamtgen and the oxidation states route of the Materials Project

AntObi · September 13, 2023, 5:43pm

I’ve noticed that it is possible to obtain oxidation states from the Materials Project for particular structures, which we cannot obtain by using the BVAnalyzer through pymatgen.
An example would be mp-1208519 which I can obtain an oxidation state decorated structure through the oxidation_states route, but I cannot decorate the structure using BVAnalyzer. I have included the code below to demonstrate this example.

from mp_api.client import MPRester
from pymatgen.analysis.bond_valence import BVAnalyzer as BVA
import os

API_KEY = os.environ["MP_API_KEY"]

with MPRester(API_KEY) as mpr:
    doc = mpr.materials.oxidation_states.get_data_by_id("mp-1208519")
    print(doc.structure)
    structure = mpr.get_structure_by_material_id("mp-1208519")
    BVA().get_oxi_state_decorated_structure(structure)
    print(structure)

I was wondering if there is maybe an updated bv_params file which has been used to calculate these oxidation states for the Materials Project entries which isn’t available in pymatgen.

For some additional context, I have found that for the 154,718 entries in the Materials Project I can get 116,363 materials with oxidation states through the oxidation_states route of the API, but only 103,687 structures if I were to use the BVAnalyzer to decorate each structure obtained through the summary route.

TL;DR: Why is there a discrepancy between the results of theBVAnalyzer in pymatgen and the oxidation_states route of the Materials Project API

munrojm · September 13, 2023, 8:23pm

If the BVAnalyzer fails for a material in the build process, the oxidation state data is obtained from the Composition.oxi_state_guesses method.

– Jason

AntObi · September 13, 2023, 9:07pm

Thanks for the reply Jason! Sticking with this example then using Composition.oxi_state_guesses would return {‘As’: 0.5, ‘O’: -2.0, ‘Tb’: 3.0} as the most probable oxidation states based on my understanding of the method.

from mp_api.client import MPRester
from pymatgen.analysis.bond_valence import BVAnalyzer as BVA
import os
import pprint

API_KEY = os.environ["MP_API_KEY"]

with MPRester(API_KEY) as mpr:
    doc = mpr.materials.oxidation_states.get_data_by_id("mp-1208519")
    #print(doc.structure)
    print(doc.possible_species)
    structure = mpr.get_structure_by_material_id("mp-1208519")
    try:
        BVA().get_oxi_state_decorated_structure(structure)
        print(structure)
    except Exception as e:
        pprint.pprint(structure.composition.oxi_state_guesses(max_sites=len(structure)))

output

Retrieving OxidationStateDoc documents: 100%|██████████| 1/1 [00:00<00:00, 4415.06it/s]

[‘As0+’, ‘Tb4+’, ‘O2-’]

Retrieving MaterialsDoc documents: 100%|██████████| 1/1 [00:00<00:00, 6921.29it/s]

({‘As’: 0.5, ‘O’: -2.0, ‘Tb’: 3.0},
{‘As’: 0.0, ‘O’: -2.0, ‘Tb’: 4.0},
{‘As’: 0.25, ‘O’: -2.0, ‘Tb’: 3.5},
{‘As’: 0.375, ‘O’: -2.0, ‘Tb’: 3.25},
{‘As’: 0.125, ‘O’: -2.0, ‘Tb’: 3.75})

From the documentation, the returned list of dictionaries are in order of most probable. For mp-1208519, the oxidation states on the Materials Project website, as well as returned by the API, is the second most probable from the list. Is there a preference given to all-integer oxidation states, and if none are found, then the most probable oxidation states are used?
As well, have I missed the documentation somewhere for the build process for this route?

munrojm · September 13, 2023, 9:21pm

We shouldn’t be giving a preference, so I will have to take a closer look at that. The exact code we use to populate the document is in emmet: https://github.com/materialsproject/emmet/blob/4634eac272d321614db4ab5558832ceee19ad130/emmet-core/emmet/core/oxidation_states.py

– Jason

AntObi · September 13, 2023, 9:54pm

Thanks Jason.
Using the code in emmet, I can get the same oxidation states for this structure as through the oxidation_states route.

I just needed to set max_sites=-50 when using composition.oxi_state_guesses method. As I didn’t set that in the code in my prior replies.

My problem is solved, though out of curiosity, why is max_sites=-50? Is that an arbitrary choice or is there specific reason i.e. why not -80, -1 etc?

tsmathis · February 22, 2024, 11:39pm

Thread closed due to inactivity, please open a new thread to address related issues.