Lammps error that I do not understand: ERROR: Invalid run command upto value (run.cpp:114); restart file does not work; way to convert .dump to restart file?

Hello,

My lammps jobs have stopped for some reason that I do not understand.
my lammps parameterfile contains the lines:

"""
variable RunNumber equal 14083200000.0
variable PrintStep equal 3200000.0

variable DumpFile string "chain_000.dump"

variable RestartFile string "chain.restart"

variable Restart equal 0

.
.
.

if "\{Restart\} > 0" then "read\_restart {RestartFile}" else "read_data ${InputFile}"

restart \{PrintStep\} {RestartFile} ${RestartFile}

timestep \{TimeStep\} thermo {PrintStep}
dump 4bis all atom \{PrintStep\} {DumpFile}
dump_modify 4bis scale no
fix 5 GroupLangevin langevin \{Temp\} {Temp} \{Damping\} {Seed}

run 2147483647.0 upto
run 4294967294.0 upto
run 6442450941.0 upto
run 8589934588.0 upto
run 10737418235.0 upto
run 12884901882.0 upto
run ${RunNumber} upto

"""

Everything works as expected, but when the iteration reaches 12884901882 steps, the program stops saying:

"""
.
.
.

12883200000 0.00050569346 0 27.101733 27.102488 3.2846451e-05 432539.86
12884901882 0.00050511246 0 27.101517 27.102271 2.7622769e-05 432539.86
Loop time of 475507 on 1 procs for 2147483647 steps with 200 atoms

Pair time (\) = 13224\.4 \(2\.78111\) Bond time \() = 273052 (57.4234)
Neigh time (\) = 1\.8723 \(0\.000393749\) Comm time \() = 1272.36 (0.267579)
Outpt time (\) = 1\.36163 \(0\.000286353\) Other time \() = 187955 (39.5272)

Nlocal: 200 ave 200 max 200 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Ave special neighs/atom = 5.94
Neighbor list builds = 254
Dangerous builds = 0
run ${RunNumber} upto
run 1.40832e+10 upto
ERROR: Invalid run command upto value (run.cpp:114)
"""

The evil thing is, that, in addition, my restart file does not work. I do not know why, I have used several lammps driver files that uses the same restartfile as in this line:

restart \{PrintStep\} {RestartFile} ${RestartFile}

and the (one and only) restart file was always perfectly usable.

But here, when I try to use the restart file, lammps claims that I have less atoms than the original amount (I have 200 atoms, and the restart file contains between 196 and 199 atoms; I have a number of similar lammps runs that all stop at the same position).

So, the question is: can I repair my restart file using the dump file somehow? generate a restart file from the last (correctly written) state in the dump file?

And, what exactly is the problem that lammps has when starting the last run command?

Many Thanks for all help!

Hello,

My lammps jobs have stopped for some reason that I do not understand.
my lammps parameterfile contains the lines:

"""
variable RunNumber equal 14083200000.0
variable PrintStep equal 3200000.0

variable DumpFile string "chain_000.dump"

variable RestartFile string "chain.restart"

variable Restart equal 0

.
.
.

if "\{Restart\} > 0" then "read\_restart {RestartFile}" else "read_data ${InputFile}"

restart \{PrintStep\} {RestartFile} ${RestartFile}

timestep \{TimeStep\} thermo {PrintStep}
dump 4bis all atom \{PrintStep\} {DumpFile}
dump_modify 4bis scale no
fix 5 GroupLangevin langevin \{Temp\} {Temp} \{Damping\} {Seed}

run 2147483647.0 upto
run 4294967294.0 upto
run 6442450941.0 upto
run 8589934588.0 upto
run 10737418235.0 upto
run 12884901882.0 upto
run ${RunNumber} upto

"""

Everything works as expected, but when the iteration reaches 12884901882 steps, the program stops saying:

"""
.
.
.

12883200000 0.00050569346 0 27.101733 27.102488 3.2846451e-05 432539.86
12884901882 0.00050511246 0 27.101517 27.102271 2.7622769e-05 432539.86
Loop time of 475507 on 1 procs for 2147483647 steps with 200 atoms

Pair time (\) = 13224\.4 \(2\.78111\) Bond time \() = 273052 (57.4234)
Neigh time (\) = 1\.8723 \(0\.000393749\) Comm time \() = 1272.36 (0.267579)
Outpt time (\) = 1\.36163 \(0\.000286353\) Other time \() = 187955 (39.5272)

Nlocal: 200 ave 200 max 200 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Ave special neighs/atom = 5.94
Neighbor list builds = 254
Dangerous builds = 0
run ${RunNumber} upto
run 1.40832e+10 upto
ERROR: Invalid run command upto value (run.cpp:114)
"""

the explanation here is simple.

the run command expects an integer,
but your compute it producing a floating
point number. that is fine for as long as
it remains in fractional format, but breaks
as soon as it switches to exponential format.

The evil thing is, that, in addition, my restart file does not work. I do not know why, I have used several lammps driver files that uses the same restartfile as in this line:

restart \{PrintStep\} {RestartFile} ${RestartFile}

and the (one and only) restart file was always perfectly usable.

But here, when I try to use the restart file, lammps claims that I have less atoms than the original amount (I have 200 atoms, and the restart file contains between 196 and 199 atoms; I have a number of similar lammps runs that all stop at the same position).

So, the question is: can I repair my restart file using the dump file somehow? generate a restart file from the last (correctly written) state in the dump file?

please produce a proper (and minimal) input deck, that reproduces
this behavior. "it doesn't work" is a useless bug report.

losing atoms almost always is an indicator or a simulation gone wrong.

axel.

Thank you for the response, I thought it might be linked to the float format.

I am afraid that I am not able to produce a simple program that would reproduce this error.
This only happens for this kind of files that did run a long time and then, in the last run command, are presented with a float and stop due to that.

I do have two questios though:
1)
The exponential writing in the screen.lammps file has nothing to do with me; all my numbers
are written out as integers (although with .0 at the end).

Why then does lammps not have any problems swallowing

run 12884901882.0 upto

, but it chokes on

variable RunNumber equal 14083200000.0
run ${RunNumber} upto

?

2)
It is not I or lammps that did lose the atoms. They are perfectly there in the dump file, no peculiarity can be seen. They are just lacking in the .restart file.
It took days to calculate these runs as far as they are, is there no way to restart the run (with proper integers as the RunNumber) using the .dump file?

Thanks for any help!

Thank you for the response, I thought it might be linked to the float format.

I am afraid that I am not able to produce a simple program that would reproduce this error.
This only happens for this kind of files that did run a long time and then, in the last run command, are presented with a float and stop due to that.

so what? just set up a silly calculation that has only
a small number of atoms. the point is that it takes
time and effort to do this and since debugging by itself
it time consuming, any *additional* effort required to
track down a problem, would just make it less likely
that somebody would spend time on it.

that being said, in the case of time steps, you can
easily use, for example, the reset_timestep to skip
over the initial simulation period.

I do have two questios though:
1)
The exponential writing in the screen.lammps file has nothing to do with me; all my numbers
are written out as integers (although with .0 at the end).

Why then does lammps not have any problems swallowing

run 12884901882.0 upto

, but it chokes on

variable RunNumber equal 14083200000.0
run ${RunNumber} upto

because that expands to:

Dangerous builds = 0
run ${RunNumber} upto
run 1.40832e+10 upto
ERROR: Invalid run command upto value (run.cpp:114)

you have to look *properly* at the output.

why can't you just use reset_timestep and
run each chunk of the simulation with timestep
numbers starting from 0?

?

2)
It is not I or lammps that did lose the atoms. They are perfectly there in the dump file, no peculiarity can be seen. They are just lacking in the .restart file.

how do you prove that? i may very well be something in the
input where you read in the restart. you won't be the first
person to have made mistakes there.

It took days to calculate these runs as far as they are, is there no way to restart the run (with proper integers as the RunNumber) using the .dump file?

why this obsession with the time step number?
until not so long ago, this wasn't possible at all with
lammps and people were still able to do their work.

if you suspect the time step number messing up
things, you can easily prove that by just setting up
a simulation and then using reset_timestep to crank
up the time step number close to whatever gives you
the problem. at the moment, there is no conclusive
evidence that there is a bug in lammps.

axel.

, but it chokes on

variable RunNumber equal 14083200000.0
run ${RunNumber} upto

because that expands to:

Dangerous builds = 0
run ${RunNumber} upto
run 1.40832e+10 upto
ERROR: Invalid run command upto value (run.cpp:114)

BTW: the reason are the way how the %g format works.
a simple workaround to make also larger step numbers
look like integers would be this change.

diff --git a/src/variable.cpp b/src/variable.cpp
index af2e708..134d3c5 100644
--- a/src/variable.cpp
+++ b/src/variable.cpp
@@ -455,9 +455,9 @@ char *Variable::retrieve(char *name)
     strcpy(data[ivar][0],result);
     str = data[ivar][0];
   } else if (style[ivar] == EQUAL) {
- char result[32];
+ char result[64];
     double answer = evaluate(data[ivar][0],NULL);
- sprintf(result,"\.10g",answer\); \+ sprintf\(result,".20g",answer);
     int n = strlen(result) + 1;
     if (data[ivar][1]) delete [] data[ivar][1];
     data[ivar][1] = new char[n];

cheers,
    axel.

Just made these changes to avoid this integer <-> float
issue. Will be in the next patch.

Steve