Is it Expected for Output Structure to Deviate from FCC Seed Framework When Using Pymatgen SQSTransformation with mcsqs?

Hello pymatgen community,

I’m using pymatgen’s SQSTransformation to generate a special quasirandom structure (SQS) for a binary Cu-Mn system, with the Cu FCC structure as the seed framework. I am using the mcsqs method underneath. After applying the transformation using apply_transformation, I used FrameworkComparator to check the similarity between the output structure and the original FCC seed structure. However, the output structure doesn’t match the initial FCC structure as I expected.

Is this deviation from the original FCC seed structure expected when using SQSTransformation with the mcsqs method, or is this a red flag indicating something may be wrong with my setup or the transformation process? I would appreciate any insights into whether this is typical behavior or if there’s something I should adjust in my workflow.

Thank you in advance for your help!

Best,
Nadav

Hi @Nadav_Moav, the structure you get on output should have the same underlying fcc structure, although the crystalline symmetry may be in a different representation. A way you can check this is:

mcsqs_struct = SQSTransformation().apply_transformation(disordered_structure)

mcsqs_struct_copy = mc_sqs_struct.copy()
mcsqs_struct_copy.replace_species({'Cu': 'Mn'}) # replaces all Cu sites with Mn in-place
print(disordered_structure.get_space_group_info(), mcsqs_struct_copy.get_space_group_info())

you should get the same symmetries from both structures.

Hi Aaron,

Thanks again for the suggestion — it worked!

Just to understand this better: I thought FrameworkMatcher is supposed to ignore species when comparing structures, so I was expecting it to recognize the same framework even without manually replacing the species.
Also, in one test case I tried, the output SQS had exactly the same space group and lattice parameters as the seed structure, but FrameworkMatcher still didn’t recognize them as matching until I applied your trick of replacing all species with the same element.

Here’s the information from that test case:

Seed structure:
K1 Ti1 O3
1.0
3.6917200000000001 0.0000000000000000 0.0000000000000002
-0.0000000000000002 3.6917200000000001 0.0000000000000002
0.0000000000000000 0.0000000000000000 5.7668970000000002
K Ti O
1 1 3
direct
0.5000000000000000 0.5000000000000000 0.9962770000000000 K
0.0000000000000000 0.0000000000000000 0.4947610000000000 Ti
0.0000000000000000 0.5000000000000000 0.6058740000000000 O
0.5000000000000000 0.0000000000000000 0.6058740000000000 O
0.0000000000000000 0.0000000000000000 0.1946150000000000 O

Output SQS structure:
K4 Ti8 Bi4 O24
1.0
7.3834400000000002 0.0000000000000000 0.0000000000000000
0.0000000000000000 7.3834400000000002 0.0000000000000000
0.0000000000000000 0.0000000000000000 11.5337940000000003
K Bi K Bi K Bi Ti O
2 1 1 1 1 2 8 24
direct
0.2500000000000000 0.2500000000000000 0.4981380000000000 K
0.2500000000000000 0.7500000000000000 0.4981380000000000 K
0.2500000000000000 0.2500000000000000 0.9981380000000000 Bi
0.2500000000000000 0.7500000000000000 0.9981380000000000 K
0.7500000000000000 0.2500000000000000 0.4981380000000000 Bi
0.7500000000000000 0.7500000000000000 0.4981380000000000 K
0.7500000000000000 0.2500000000000000 0.9981380000000000 Bi
0.7500000000000000 0.7500000000000000 0.9981380000000000 Bi
1.0000000000000000 1.0000000000000000 0.2473800000000000 Ti
0.5000000000000000 1.0000000000000000 0.2473800000000000 Ti
1.0000000000000000 1.0000000000000000 0.7473800000000000 Ti
0.5000000000000000 1.0000000000000000 0.7473800000000000 Ti
1.0000000000000000 0.5000000000000000 0.2473800000000000 Ti
0.5000000000000000 0.5000000000000000 0.2473800000000000 Ti
1.0000000000000000 0.5000000000000000 0.7473800000000000 Ti
0.5000000000000000 0.5000000000000000 0.7473800000000000 Ti
0.2500000000000000 1.0000000000000000 0.3029370000000000 O
0.2500000000000000 1.0000000000000000 0.8029370000000000 O
0.2500000000000000 0.5000000000000000 0.3029370000000000 O
0.2500000000000000 0.5000000000000000 0.8029370000000000 O
0.7500000000000000 1.0000000000000000 0.3029370000000000 O
0.7500000000000000 1.0000000000000000 0.8029370000000000 O
0.7500000000000000 0.5000000000000000 0.3029370000000000 O
0.7500000000000000 0.5000000000000000 0.8029370000000000 O
1.0000000000000000 0.2500000000000000 0.3029370000000000 O
1.0000000000000000 0.2500000000000000 0.8029370000000000 O
1.0000000000000000 0.7500000000000000 0.3029370000000000 O
1.0000000000000000 0.7500000000000000 0.8029370000000000 O
0.5000000000000000 0.2500000000000000 0.3029370000000000 O
0.5000000000000000 0.2500000000000000 0.8029370000000000 O
0.5000000000000000 0.7500000000000000 0.3029370000000000 O
0.5000000000000000 0.7500000000000000 0.8029370000000000 O
1.0000000000000000 1.0000000000000000 0.0973070000000000 O
0.5000000000000000 1.0000000000000000 0.0973070000000000 O
1.0000000000000000 1.0000000000000000 0.5973070000000000 O
0.5000000000000000 1.0000000000000000 0.5973070000000000 O
1.0000000000000000 0.5000000000000000 0.0973070000000000 O
0.5000000000000000 0.5000000000000000 0.0973070000000000 O
1.0000000000000000 0.5000000000000000 0.5973070000000000 O
0.5000000000000000 0.5000000000000000 0.5973070000000000 O

Do you have an idea why that happens? Is there something else that FrameworkMatcher is sensitive to beyond species and basic lattice parameters?

Thanks again for your time and help!

Best,
Nadav

If you’re using the StructureMatcher class with the FrameworkComparator, you also have to do attempt_supercell = True for the check to work. Not sure if that’s what you meant by FrameworkMatcher?

If you replace species in the MCSQS structure first to match the original composition, the structures could match with attempt_supercell = False when primitive cell reduction is performed.

See this code for details, the last line prints True for a match between the MCSQS structure without changes and the original structure

from pymatgen.core import Structure
from pymatgen.analysis.structure_matcher import StructureMatcher, FrameworkComparator

inp_struct = Structure.from_str("""K1 Ti1 O3
1.0
3.6917200000000001 0.0000000000000000 0.0000000000000002
-0.0000000000000002 3.6917200000000001 0.0000000000000002
0.0000000000000000 0.0000000000000000 5.7668970000000002
K Ti O
1 1 3
direct
0.5000000000000000 0.5000000000000000 0.9962770000000000 K
0.0000000000000000 0.0000000000000000 0.4947610000000000 Ti
0.0000000000000000 0.5000000000000000 0.6058740000000000 O
0.5000000000000000 0.0000000000000000 0.6058740000000000 O
0.0000000000000000 0.0000000000000000 0.1946150000000000 O
""", fmt = "poscar")

mcsqs_struct = Structure.from_str("""K4 Ti8 Bi4 O24
1.0
7.3834400000000002 0.0000000000000000 0.0000000000000000
0.0000000000000000 7.3834400000000002 0.0000000000000000
0.0000000000000000 0.0000000000000000 11.5337940000000003
K Bi K Bi K Bi Ti O
2 1 1 1 1 2 8 24
direct
0.2500000000000000 0.2500000000000000 0.4981380000000000 K
0.2500000000000000 0.7500000000000000 0.4981380000000000 K
0.2500000000000000 0.2500000000000000 0.9981380000000000 Bi
0.2500000000000000 0.7500000000000000 0.9981380000000000 K
0.7500000000000000 0.2500000000000000 0.4981380000000000 Bi
0.7500000000000000 0.7500000000000000 0.4981380000000000 K
0.7500000000000000 0.2500000000000000 0.9981380000000000 Bi
0.7500000000000000 0.7500000000000000 0.9981380000000000 Bi
1.0000000000000000 1.0000000000000000 0.2473800000000000 Ti
0.5000000000000000 1.0000000000000000 0.2473800000000000 Ti
1.0000000000000000 1.0000000000000000 0.7473800000000000 Ti
0.5000000000000000 1.0000000000000000 0.7473800000000000 Ti
1.0000000000000000 0.5000000000000000 0.2473800000000000 Ti
0.5000000000000000 0.5000000000000000 0.2473800000000000 Ti
1.0000000000000000 0.5000000000000000 0.7473800000000000 Ti
0.5000000000000000 0.5000000000000000 0.7473800000000000 Ti
0.2500000000000000 1.0000000000000000 0.3029370000000000 O
0.2500000000000000 1.0000000000000000 0.8029370000000000 O
0.2500000000000000 0.5000000000000000 0.3029370000000000 O
0.2500000000000000 0.5000000000000000 0.8029370000000000 O
0.7500000000000000 1.0000000000000000 0.3029370000000000 O
0.7500000000000000 1.0000000000000000 0.8029370000000000 O
0.7500000000000000 0.5000000000000000 0.3029370000000000 O
0.7500000000000000 0.5000000000000000 0.8029370000000000 O
1.0000000000000000 0.2500000000000000 0.3029370000000000 O
1.0000000000000000 0.2500000000000000 0.8029370000000000 O
1.0000000000000000 0.7500000000000000 0.3029370000000000 O
1.0000000000000000 0.7500000000000000 0.8029370000000000 O
0.5000000000000000 0.2500000000000000 0.3029370000000000 O
0.5000000000000000 0.2500000000000000 0.8029370000000000 O
0.5000000000000000 0.7500000000000000 0.3029370000000000 O
0.5000000000000000 0.7500000000000000 0.8029370000000000 O
1.0000000000000000 1.0000000000000000 0.0973070000000000 O
0.5000000000000000 1.0000000000000000 0.0973070000000000 O
1.0000000000000000 1.0000000000000000 0.5973070000000000 O
0.5000000000000000 1.0000000000000000 0.5973070000000000 O
1.0000000000000000 0.5000000000000000 0.0973070000000000 O
0.5000000000000000 0.5000000000000000 0.0973070000000000 O
1.0000000000000000 0.5000000000000000 0.5973070000000000 O
0.5000000000000000 0.5000000000000000 0.5973070000000000 O
""", fmt = "poscar")

matcher = StructureMatcher(
    comparator= FrameworkComparator(),
    attempt_supercell=True
)

matcher.fit(inp_struct,mcsqs_struct)

Hi, thanks for the explanation and the code example it worked perfectly.

I’m still trying to wrap my head around why I need to use either attempt_supercell=True or manually call .replace_species() on the MCSQS structure for it to match the seed. Without one of those, the match fails, even though the underlying framework is the same.

What’s confusing is that when I first tested FrameworkComparator, I compared a structure from Materials Project with a supercell version I created in VESTA — and the match worked without attempt_supercell=True. So I assumed FrameworkComparator was already handling that kind of comparison internally.

But now, in this SQS case, it only works if I explicitly guide it — either by using .replace_species() to revert the MCSQS structure to the seed composition, even when I thought the species aren’t relevant for FrameworkComparator, or by enabling attempt_supercell. Why does the framework match succeed in one case and not the other?

Is it something about how the atom types or ordering are handled internally?

Appreciate any insight — feels like there’s a subtle piece I’m missing.

Best,
Nadav

Sorry I missed this! The StuctureMatcher logic is a bit confusing, but the docstring is helpful in this case:

attempt_supercell (bool): If set to True and number of sites in
cells differ after a primitive cell reduction (divisible by an
integer) attempts to generate a supercell transformation of the
smaller cell which is equivalent to the larger structure.

The key here is that StructureMatcher can’t match sites that aren’t present in a structure. It doesn’t generate all possible images of sites within inp_struct. Thus the 5 sites in inp_struct cannot match to those in mcsqs_struct, which has 40 sites

When attempt_supercell is True, StructureMatcher scales inp_struct by 8 and then both structures match.

Alternatively, you could disable primitive cell reduction and scale inp_struct yourself by a factor of 8:

StructureMatcher(
    comparator= FrameworkComparator(),
    attempt_supercell=False,
    primitive_cell=False,
).fit(inp_struct*(2,2,2),mcsqs_struct)
>>> True

(The key is disabling primitive cell reduction, or the supercell scaling of inp_struct will be undone.) Hope that helps!