Charge-balance analysis using pymatgen

I’m trying to better figure out the topic of charge-balance analysis in chemical compositions, using functions offered by pymatgen.

In particular, I’m interested in this as I’m working with generative machine learning models for inorganic compositions, and I’m trying to assess how many chemical compositions sampled from my trained models are charge-balanced.

Something that it’s not clear to me, is that this kind of analysis (using for example oxi_state_guesses() ) seems to require chemical compositions to have integer coefficients.

As non-material-scientist, I’ve understood that in inorganic chemistry it’s very frequent to encounter compounds that are non-stoichiometric (with coefficients ranging in a continuous space) and then this charge balance analysis using automated tools seems not to be straightforward anymore.

That’s why I’m wondering if there is someway to convert non-stoichiometric compounds in my dataset to re-normalized stoichiometric ones using pymatgen, I’m listing an example of what I would think to obtain below:

Let’s say I have in my dataset the compound Hg0.7 Cd0.3 Te1. It is charge-balanced considering oxidation numbers (+2, +2, -2)

From basic rules of Chemistry that I remember from high school, if we divide by the smallest coefficient and then multiply by a suitable small number, we obtain the normalized compound. So in this case we would get

Hg0.7 Cd0.3 Te1 —> Hg7 Cd3 Te10

and (obviously) the resulted composition is still charge balanced with oxidation numbers (+2,+2,-2).

So, is there a similar way to bring my dataset into such equivalent representation in order to be able to check charge-balance in a quicker manner?

Hey Federico,

I’m not sure if you’re still facing this problem, but is what you are trying to do is determine an empirical formula given a non-empirical formula?