Handling partial occupancy in cif files

I’m currently working on calculations involving Voronoi tessellation and bond valence site energy. One hurdle I’m facing is the requirement for CIF files with full occupancy. In my quest for solutions, I’ve come across the Supercell program, enumlib, and similar tools. However, I’m grappling with the decision-making process for selecting a supercell that closely aligns with the original compound. Can anyone offer guidance on this? Additionally, I’d appreciate insights into computational strategies for addressing the occupancy problem.

Thanks in advance!

Hi @jrjfonseca, if I understand correctly, you have a set of disordered structures and need a way to make ordered representations of them. Is that right?

Typically for alloys (metals or even semiconductors like In_x Ga_(1-x) As), you’d use a special quasirandom structure (SQS). This method optimizes the degree of randomness in the structure, but doesn’t guarantee that the SQS representation is close to the hull. You can create SQSes with pymatgen.transformations.advanced_transformations.SQSTransformation, which needs the external ATAT package (link).

Another option, which you mentioned, is using a tool like enumlib. Pymatgen has similar features like pymatgen.transformations.standard_transformations.OrderDisorderedStructureTransformation, which creates an ordered supercell by minimizing the Ewald electrostatic energy (or M3GNet total energy) over an enumerated set of supercells. This may give you an ordered representation that is closer to the hull.

Hope that helps you get started looking at options!