Conserving Lattice Vectors in MCSQS + best result to choose

cpashartis · March 16, 2025, 4:59am

Hi Axel,

I ran multiple mcsqs simulations only to realize by then end that the lattice vectors of each final result are drastically different. Due to the code I need to run after the quasi-random generator, the atom’s bonds need to be aligned with the lattice vectors. My original input file conserves this, but since I have to reduce down to the unit cell to run mcsqs, it doesn’t stay conserved, is there anyway to keep the same geometry of the initial input file?
If you need picture examples let me know
I’ll get back to you after I look at some papers that have justified the use of certain sqs structures on how to choose the best result.
This is a throwback to my previous post… is there any way to record the supercell each time the objective function is changed?
Pairs/Triplets etc. I’m not entirely sure yet what is meant by these, I browsed through the first couple references in your original paper (including the very first sqs paper reference), but they have no explicit mention of pairs/triplets etc.

Piggybacking off of this, how is it that the code knows the multiplicity of each cluster in the cluster expansion before knowing the supercell geometry?

Many question, so thanks in advance (I tried to bold the important bits of each question).

cpashartis · March 16, 2025, 5:07am

Hi Axel,

Thanks for the great responses, I have to give some a go still but my primary concern right now is getting the correct lattice shape out.

(Basically how do you build it, manual doesn’t say much ?)
In that regard, I know that the first row in sqscell.out should be the number of supercells (how is this determined?). Additionally, the following lines are in three rows which I presume are the lattice vectors in cartesian for each supercell (how are these determined?). I looked through some of the code and there seems to be an arbitrary transformation matrix for the lattice vectors (probably since I don’t know the theory). Where can I look this up (unless you can explain it quick).

Just to be certain what I want is the atoms to be in the exact same spot as my original input file, i.e. translating the unit cell to the correct coordinates.

Regarding choosing the correct mcsqs, last time I talked to you, you mentioned using 3rd nearest neighbour for pairs, 2nd for triplets, 1st for quadruplets. Ultimately, and judging by the forums, the idea is to keep my correlations within 10% and have the most amount of triplets/pairs/quadruplets etc.
To that end, is it reasonable to consider only those 3 categories, or should I expand my search to be as large as possible then determine which has the correlations within 10%. I’m aware that there is a built-in feature to compare differences in pair, triplet etc. lengths - but it seems safer to simply run the cases.

In terms of arguing for my result, the most intuitive idea I could think of would be plotting the percent difference in the corrlations (5th column/4th column in bestcorr.out). As well as plotting convergence on my final simulation using the sqs generator. Can you think of anything else that would provide a strong argument?

avdw · March 16, 2025, 5:15am

You can specify your own cell(s), see mcsqs -h :

You could edit mcsqs.c++. Just before

      ofstream strfile;

add


system("mv bestsqs2.out bestsqs3.out ; mv bestsqs1.out bestsqs2.out ; mv bestsqs.out bestsqs1.out");
system("mv bestcorr2.out bestcorr3.out ; mv bestcorr1.out bestcorr2.out ; mv bestcorr.out bestcorr1.out");

This would keep track of the 4 best sqs.
4. These represent the joint distribution of the occupation of the two, three, etc. sites.
Perhaps the following text book will help: F. Ducastelle. Order and Phase Stability in Alloys. (Sorry it takes quite a bit of time to write out the theory here…)
The multiplicity only depends on the lattice of sites, before they are occupied by specific atoms.

avdw · March 16, 2025, 5:24am

The file format is explained in maps -h
Sqscell.out should give line 4-6 of a standard ATAT structure file.
In your case the number of supercell is just 1 since you want to force one specific cell.

cpashartis · March 16, 2025, 5:33am

Hi Axel,

This will probably be painfully short, but I still can’t get the 1 case to work. I get a segmentation fault… so either I’m implementing something wrong, or some assumption is made in the code (I have tracked down the variable name, but if you could check if I did something wrong first it would help). If I didn’t do anything wrong, I can bug report as to what I found. (Is the code buggy to empty lines?)

1

0.250000 0.000000 -0.250000
-0.000000 0.000000 -0.250000
-0.000000 0.250000 -0.250000

avdw · March 16, 2025, 5:37am

Can you give me your rndstr.in file ?

cpashartis · March 17, 2025, 12:06am

I’m using this one:

16.007990 0.000000 0.000000
8.003995 13.863326 0.000000
8.003995 4.621109 13.070469
0.250000 0.000000 -0.250000
-0.000000 0.000000 -0.250000
-0.000000 0.250000 -0.250000
0.062500 0.062500 -0.187500 As=0.859375,Bi=0.140625
0.250000 0.250000 -0.750000 Ga

avdw · March 17, 2025, 12:07am

I don’t see a seg fault when I run (Make sure you download the latest version, "beta version of the latest release")
But the error I do see is:
"Impossible to match point correlations due to incompatible supercell size."
The reason is that you specified a "supercell" of only one unit cell (one active site in this case). You need larger cell to create an SQS. Also, the number of active sites in it must be such that composition*nb_of_site=integer .
Your composition is not close to a simple fraction, so the required cell may be very large.

cpashartis · March 18, 2025, 12:08am

Thanks, I don’t get that error anymore.

Regarding what you said [quote]
The reason is that you specified a "supercell" of only one unit cell (one active site in this case). You need larger cell to create an SQS. Also, the number of active sites in it must be such that composition*nb_of_site=integer .
Your composition is not close to a simple fraction, so the required cell may be very large.
[/quote]

So the root cause of the new error is not having a large enough supercell? Even though 0.859375 * 128 = int etc.?

So I’d hate to say it, but it looks like the beta version has a bug. I just copied an entire directory that worked with the stable release and it didn’t work. This might be a bug with using -rc, as I have been testing using previously generated enumeration from your stable code.

Unless of course, I’m doing something wrong.

avdw · March 18, 2025, 12:09am

Yes, sorry, there was a bug (the -rc option was inaccessible). It’s fixed now in the re-uploaded 3.16 version.
-Axel

cpashartis · March 18, 2025, 12:10am

In the newest version I still get the same error as last time, but I can use rc. I just cross checked some simulations and found out that the difference method used in the code to raise the exception (point correlation), seems to give different values with the stable and beta mcsqs. I’m using the exact same files.

I could have done something wrong earlier and it wasn’t caught in the stable version, I’ll try and track down where it is going wrong for you. I’m using the same files as posted in this thread.

(Should we perhaps move this to the bug section?)

avdw · March 18, 2025, 1:23am

The code runs fine here with a sqscell.in containing:

and running

mcsqs -2=6
mcsqs -rc &

Note that there was a change in the format of the sqscell.in file at version 3.07 which may explain your different behavior with the two versions (I’ve added a warning to that effect in the help file).

cpashartis · March 18, 2025, 1:28am

Thanks Axel. I see the change was to force integers (even though I think that was the intended case before).

One question regarding this. Since I want to have the same lattice vectors, I found that if I choose lines 4-6 in rndstr.in to be equal to the vectors in sqscell.out (so that both were in integers) I get the same as error as before (point correlation).

I ended up leaving the rndstr.in file as was posted here earlier and simply put the correct integers in the sqscell.out file, it seems to have conserved the lattice structure I wanted. Is this working as intended?

Thanks a lot :), I might be back later for discussion of my results, but I greatly appreciate the help.

cpashartis · March 18, 2025, 1:37am

Hi Axel, when you get a chance to answer these questions it would be appreciated.

Regarding my previous post, I thought I would restate it appropriately. Why is it that you can’t choose the same lattice vectors in sqscell.out as in rndstr.in? Even more unsettling to me is that the supercell volume is conserved in mcsqs no matter the integers chosen in sqscell.out Should this be happening?
The more important question - I think I have alluded to this many times but now that I have data to discuss, it is easier addressed. Numerous times I have found throughout the forums, that the best way to choose the best structure is to find the one that has correlations within 5% of the random structure.
To this end, besides convention of choosing 3rd nearest neighbour for pairs etc., should I just find the largest number of correlations that fit this criteria? Would you agree that the most effective method of finding the difference is to average the correlation differences for a structure and then choose the lowest?
Regarding convention for choosing pair distances, does 3rd nearest neighbour refer to the atomic sites, or to the correlation types (i.e. find the 3 closest clusters for pairs, or apply a search distance of 3rd nearest neighbour to the site). Are you aware of any references that I can use, I have yet to find one?

As usual, it is very much appreciated.

avdw · March 18, 2025, 1:43am

No this should not be hapenning: Does it say "Reading supercells…" in the log file?
I don’t know why 5% is a mgic number… You can implement your own calc_objective_func in mc_sqs.c++ if you have other preferences.
The pair distance are in the same unit of distant that you use in rndstr.in . The code don’t not know about 1st, 2nd, 3rd nearest neighbors - just distances.

cpashartis · March 18, 2025, 4:25am

It was stuck at
Reading supercells…
Initializing random supercells…
if I remember correctly (I have since removed this data, I can recreate it if you think it would be beneficial).

Other than this case of not running, is it running properly if I leave my rndstr.in file with fractional lattice vectors and give integers corresponding to the direction of lattice vectors;

16.007990 0.000000 0.000000
8.003995 13.863326 0.000000
8.003995 4.621109 13.070469
0.250000 0.000000 -0.250000
-0.000000 0.000000 -0.250000
-0.000000 0.250000 -0.250000
0.062500 0.062500 -0.187500 As=0.859375,Bi=0.140625
0.250000 0.250000 -0.750000 Ga

with

2.I guess I wasn’t clear, what I mean is if I run a bunch of different clusters, how to do I determine which lattice is the best choice? I’ve been going through some of the references you have provided in the general question forum, and I understand that for my tri atom lattice the number of clusters required to converge my objective to the most accurate value should be large. But to what point do I say arrangement A of 15 clusters is more accurate than these 30 clusters, B? Is there any statistical approach, such as 5%? (Even though shorter range pairs may have more contribution)? Is this clearer (or maybe you answered and I didn’t quite get it)?

I’d like to know since it would be faster than running week long simulations of high computational requirements to check convergence over the number of clusters.

3.I think I’m good here. As this will also depend on 2.

Thanks!

avdw · March 19, 2025, 5:01am

Regarding point 1: I ran your input here and had no problem. Perhaps your input file contains some invisible character that confuse the parsing code?
This can happen if your used a windows editor and are running in unix (cygwin), for instance.
You can attach your exact files if you need help.

Regarding point 2:
If you want to be really careful, generate a sequence of increasingly bigger SQS (that match progressively more correlations exactly). Calculate the property you want (e.g. energy) and see if it is converging - stop when results change by less than your tolerance for errors given your application). This is the most general answer. However, often, there is a hard limit on the size of SQS you can run, so that is what you use. Then you just need to optimize the trade-off between many pairs or fewers but more triplet, etc. Again using energy as a criterion, for instance.
One suggestion: it is never useful to fit triplet that extend longer than the pairs you include. See https://arxiv.org/abs/cond-mat/0201511 for such rules.

cpashartis · March 19, 2025, 5:18am

Hi Axel,

In general, how would one calculate the error coming from the MCSQS?

Referring to point 2, I might just have to do this, but I was trying to avoid the computational time as the energy band calculations we are doing take about a week. When doing these calculations, to what point would you consider the SQS potion converged enough to plug into the simulation, I’m finding seed numbers can make a drastic difference (maybe I need to play with the Temperature variable?).

Do you think using the Cross Verification formula might help alleviate the problems (do you have this built in somewhere, as it uses internal data)?

As an example, on the most rudimentary way I could think of checking the accuracy of the SQS before using it (since I’m finding issues uploading stuff):

https://drive.google.com/open?id=0B0Vv99dM6so_SjFrZTgzSU5xQVE
The individual lines represent recorded objective function changes. The light green region is the final ‘zone’ with the maximum and minimum % difference in correlation recorded in the top left. The first region contains pairs, the second triplets, the third quadruplets (though I’ll have to look into the paper you mentioned).

avdw · March 19, 2025, 4:31pm

Unfortunately I don’t think thee exists an "internal check" of the accuracy of an SQS. One typically sets the maximum size that is computionnal tractable and then looks for the best SQS within that constraint. The decade-off between more pairs of more multi body correlations is not obvious to decide on. Note that early references an SQS only focussed on pairs…
In your figure: I am not sure percent difference in correlation is the right metric since correlations are dimensionless and lie in [-1,1].