Problem with uloop

Dear Lammps users,

I am running a lammps calculation in which I use a loop variable to run 52316 calculations with the same script (see attached input file). I use Version 2014_01_10. The idea is that I want to remove one oxygen atom from my system, optimize its structure, get the energy and repeat this for all the 52316 oxygen atoms present in my system. I have run this type of calculations routinely and never had a problem… till now!

What seems to happen is that lammps somehow looses count of the loop variable, i.e. it is running, say, iteration 13881 and then the next value is 12, rather than 13882. See output below:

Increment via next: value 13877 on partition 11
Increment via next: value 13878 on partition 26
Increment via next: value 13879 on partition 10
Increment via next: value 13880 on partition 3
Increment via next: value 13881 on partition 15
Increment via next: value 12 on partition 31
Increment via next: value 13 on partition 21
Increment via next: value 14 on partition 8
Increment via next: value 15 on partition 27
Increment via next: value 16 on partition 25

I am not sure why this is happening… I never had this problem before and I have run several such calculations.

Any idea?

Thanks,

D

This sounds like a problem people occasionally
see when using universe or uloop variables
and running on multiple partitions. The handshaking
between the set of partitions is done thru a file.

The code in src/variable.cpp (line ~530) attempts

to lock this file so that only one partition (process) can update
it at a time. But it turns out there is no good
way to do that in a Linux file system. We tried
various options and none were bulletproof. The
best we could come up with was introduce some randomized
delays to minimize the change that 2 or more partitions

hit the file at the same instant.

If that logic fails, the index in that file can get messed
up, and the result is something like you see.

If anyone has a better solution, we’re open to implementing
it.

I’m guessing your individual calculations are very fast
which exacerbates this problem. If you can introduce
some random variation in the length of your runs, it
might help.

Steve

Thank you Steve. I re-ran the calculation and it worked. I will try adding some random variation in time if I experience this issue again

Best,

D