Exact Restart Long-Time Behavior

Martin_Brehm · January 11, 2019, 8:55pm

Dear Community,

I performed some experiments with writing and reading binary restart files on some relatively simple test system (ionic liquid). On the one hand, I was running the simulation as a whole, and on the other hand, I restarted the simulation in the middle of the run. Then I compared the trajectories and potential energy time series.

With a truly "exact" restart, both trajectories should be identical. However, I observed that while the first few hundred steps after the restart are identical, the trajectories start to differ significantly after around 1000 steps.

I know that a MD simulation is a good example for a chaotic system, and that tiny changes in positions/velocities lead to dramatic changes at later times. However, if the restart file contains the binary representations of all floating point numbers, I would not expect any deviation at all...

So my question is if it should be possible to have an "exact" restart with LAMMPS, such that the full trajectory after restarting strictly equals the trajectory without restarting.

The following things I already checked:

(*) I don't use a thermostat (only "fix nve"), so it can't be related to a wrongly restarted internal thermostat state. There are no other fixes apart from "nve", so no fix needs to be restarted.

(*) I don't use long-range methods in the test runs (only lj/cut/coul/cut), so it's not related to pppm.

(*) It also happens in a serial LAMMPS version, so it has nothing to do with MPI parallelism.

The things one normally evaluates from a MD simulation (RDFs, MSD, etc.) will not be affected by this finding... So if you tell me that this behavior is correct, I will just settle with it. If, however, a truly "exact" restart should be possible in theory, then I will provide some more details here to find the cause of the problem.

Best regards,
Martin

akohlmey · January 11, 2019, 9:39pm

Martin,

There is one item you have not considered: changes in the order of summation of forces.

This can happen when neighbor lists are updated and is due to the fact that floating point math is not associative.

With restarts the time steps when the neighbor lists are updated are typically not the same (unless you update them at every step). Also, with your ionic liquid system, you may have chosen a system that is quite likely to show divergence rather quickly.

Axel.

Martin_Brehm · January 12, 2019, 11:59am

Hi Axel,

thanks, that was exactly the remaining issue. With "neigh_modify every 1 delay 0 check no", the trajectories are identical for the full simulation time. (I am of course not going to use these settings for my regular runs.)

I have one remaining question. If I restart a simulation, and use "dump_modify append yes", then the step in which I restart will be written twice to the dump file (once as last step in the first run, and once as first step in the second run). Is there some trick to prevent this from inside LAMMPS, or do I need to filter out the duplicate steps afterwards? "dump_modify first no" seems to be unsuitable for this purpose if I want to write every step to trajectory.

Best regards,
Martin

akohlmey · January 12, 2019, 12:21pm

Hi Axel,

[...]

I have one remaining question. If I restart a simulation, and use
"dump_modify append yes", then the step in which I restart will be

i strongly advise against using this option. in my opinion, having
multiple trajectory files has many benefits: the risk of file
corruption is smaller, if a simulation terminates unexpectedly for
some external reason (hardware issues, misjudged wall time, etc.), it
is much easier to continue cleanly, smaller files are easier to manage
and transfer, and it is easy to combine them later, if needed, too
(and skip over redundant frames, if needed).

written twice to the dump file (once as last step in the first run, and
once as first step in the second run). Is there some trick to prevent
this from inside LAMMPS, or do I need to filter out the duplicate steps
afterwards? "dump_modify first no" seems to be unsuitable for this
purpose if I want to write every step to trajectory.

writing every step to a trajectory file is extremely wasteful and ill
advised. since configurations in neighboring frames are so strongly
correlating, their statistical relevance for analysis is very low.
anything less than writing out every 1000 MD steps (at an optimally
chosen MD time step), falls into that category. any post-processing,
that would require access to frames that frequent is much better added
to the simulation software directly. writing and parsing trajectory
files, especially in text format, requires a substantial amount of
processing power (lots of time consuming calls to exponential and
logarithm functions), not to mention that it negatively impacts
parallel scaling performance.

axel.