How to correctly handle non-stoichiometric compounds using pymatgen?

I have trouble when processing a CIF file for a non-stoichiometric structure. The error message shows: UserWarning: Issues encountered while parsing CIF: Some occupancies ([1.5]) sum to > 1 . How can I properly resolve this issue? Can I directly increase the occupancy_tolerance parameter to 1.5? What are the potential risks of doing so?

Some CIFs are badly constructed and would give you total site occupancies > 1. Physically, this can’t make sense since the sum of partial site occupancies should be 1

I would avoid increasing the occupancy_tolerance too far beyond 1 to parse the CIF, unless you later rescale the site occupancies to be 1, i.e.,

orig = Structure.from_file("bad.cif",occupancy_tolerance=1.5,fmt="cif")
sites = orig.copy().sites
for i, site in enumerate(sites):
  sites[i].species = {k : v/sum(site.species.values()) for k, v in site.species.items()}
norm_struct = Structure.from_sites(sites)

Thanks for your reply! So if I reconstruct the total occupancy to 1 after processing the CIF, would a larger occupancy_tolerance also be reasonable?

Only for the purpose of normalizing the occupancies. At the same time, you might want to investigate why the occupancies are wrong. If this is a CIF from a database, this is a common problem and you’d have to look in the parent paper to correct it