starting the simulation from the same restart file

Rolf_Erwin_Isele-Hol · May 7, 2014, 1:05pm

Dear all,

I started the same simulation from a restart file twice. In the simulation,
I do not use any features that should influence the simulations in some
statistical way, i.e. Langevin thermostats or so. I run the two simulations
on the same machine (BG/Q).

The only thing that is different between the two jobs, to the best of my
knowledge, is the way in which the simulations are spread across the
nodes of the machine.

Would you expect the simulations to provide *identical* results for the described
scenario? In my simulations the thermodynamic output in the first step is the
same for both simulations, but the second time that I print out the results after
1 ps of simulation the values are different between the two simulations.

Best,
Rolf

sjplimp · May 7, 2014, 1:21pm

Is your restart “file” in multiple files or in MPI-IO
format? If so, read_restart invokes irregular
communication to get the atoms where they
need to be. (as do several other commands,

grep for Irregular).

For irregular comm, the order that messages arrive
at a proc can be random, due to timing of messages.
That will result in different atom orderings, which is
like a round-off effect which diverges trajectories.

As a test, the randomization be turned “off” if
you look in irregular.cpp for “debug” and uncomment
a line.

Steve

akohlmey · May 7, 2014, 1:24pm

Dear all,

I started the same simulation from a restart file twice. In the simulation,
I do not use any features that should influence the simulations in some
statistical way, i.e. Langevin thermostats or so. I run the two simulations
on the same machine (BG/Q).

The only thing that is different between the two jobs, to the best of my
knowledge, is the way in which the simulations are spread across the
nodes of the machine.

Would you expect the simulations to provide *identical* results for the
described
scenario? In my simulations the thermodynamic output in the first step is
the
same for both simulations, but the second time that I print out the
results after
1 ps of simulation the values are different between the two simulations.

its been discussed in several places, including this mailing list.

indeed, immediately after a restart, you should see identical numbers,
however, anything that changes the order of how forces and energies are
summed up (different number of MPI tasks, different number of OpenMP
threads, different compiler or same compiler with different optimizations,
and so on) will result in small changes due to floating point number
truncation and the fact that floating point math is not associative. add to
that the fact that MD is a chaotic system, and you will see an exponential
divergence between trajectories that will then go on to become decorrelated
and sample equivalent but different sections of phase space.

axel.

Rolf_Erwin_Isele-Hol · May 8, 2014, 5:14am

Hi,

thanks for your replies.
Also, sorry for not checking the mailing list archives first,

Best,
Rolf

sjplimp · May 8, 2014, 2:54pm

I just made a change (next patch) for reading

restart files to have it invoke the sorting option
within the irregular comm, which means it
should give reproducible (not randomized) behavior.
Randomized is faster comm, but speed doesn’t
matter for this, so I think reproducible is better.

If you were reading the restart file as I indicated
and invoking irregular comm, then the differences
you were seeing should disappear. If you not doing
what I indicated, and still seeing differences when
restarting on the same # of procs, then I don’t think
that should happen, assuming your restart script
does not have some other command that leads to

round-off differences and diverging trajectories.

Steve