Bug when rebuilding neighbour lists?

Hello all,

A PhD student here noticed that the results of some granular simulations he was running seemed to vary depending on the skin distance selected. After looking into it a bit more, the frequency at which the neighbour lists are rebuilt affects the results when this should not be the case. I attach two files, a simple input script (in.test) and a data file of particle positions and velocities, which demonstrate the problem. This script compares two cases: rebuilding the neighbour lists based on the usual skin distance check, and rebuilding the neighbour lists on every timestep. There are substantial differences between the output dumpfiles after 100000 timesteps. I’ve been unable to track down the reason for this disparity, but I have established that the problem occurs using a serial build of the latest version of LAMMPS and is unaffected by shear history.

I’ve also noticed that a similar divergence of results occurs when running the 3D Lennard-Jones melt example given in examples/melt. If the following two cases are compared:

neigh_modify every 1 delay 0 check no

neigh_modify every 1 delay 0 check yes

a divergence occurs in the thermo output within 1000 timesteps. Since the same machine and random number seed is used, my understanding is that these results ought to be identical. Perhaps these two issues are unrelated, but the former is causing a problem as some outputs of our granular simulations appear to be highly sensitive to the skin distance.

Regards,

Kevin Hanley

Postdoctoral researcher

Dept. of Civil and Environmental Engineering

Imperial College London, SW7 2AZ

in.test (995 Bytes)

Particle_Coordinates.lj (77.2 KB)

Hello all,

A PhD student here noticed that the results of some granular simulations
he was running seemed to vary depending on the skin distance selected.
After looking into it a bit more, the frequency at which the neighbour
lists are rebuilt affects the results when this should not be the case. I
attach two files, a simple input script (in.test) and a data file of
particle positions and velocities, which demonstrate the problem. This
script compares two cases: rebuilding the neighbour lists based on the
usual skin distance check, and rebuilding the neighbour lists on every
timestep. There are substantial differences between the output dumpfiles
after 100000 timesteps. I've been unable to track down the reason for this
disparity, but I have established that the problem occurs using a serial
build of the latest version of LAMMPS and is unaffected by shear history.

the neighbor list rebuild affects the order in which forces
are summed up. due to the fact that floating point math
is not associative and that truncation happens depending
on the magnitude of the force that some value is being
added to, *every* MD with floating point will diverge and
it will diverge more quickly when you shuffle the order
around in which forces are summed up. neighbor list rebuilds
also exchange atoms between domains, which increases
the divergence in parallel calculations. there also is the
facility to sort atoms according to spatial vicinity, which
also changes the order of atoms.

due to the chaotic nature of MD, the divergence is exponential.

the only thing that has to be watched carefully is the
output signaling "dangerous builds", as those are
indications of too small a neighbor list skin or too
infrequent forced updates.

I've also noticed that a similar divergence of results occurs when
running the 3D Lennard-Jones melt example given in examples/melt. If the
following two cases are compared:

yes. and for the same reason.
you would have to re-implement LAMMPS to do
fixed point math, to avoid this (and not use a thermostat, too).

neigh_modify every 1 delay 0 check no

neigh_modify every 1 delay 0 check yes

a divergence occurs in the thermo output within 1000 timesteps. Since
the same machine and random number seed is used, my understanding is that
these results ought to be identical. Perhaps these two issues are
unrelated, but the former is causing a problem as some outputs of our
granular simulations appear to be highly sensitive to the skin distance.

yes. about 1000 timesteps with a well chosen time step
is a typical value where you start to see the kind of divergence
i was describing above.

this is all normal and should be covered in text books
about numerical integration of coupled linear partial
differential equations (i.e. MD).

cheers,
     axel.

Dear Axel,

Thank you very much for your reply and explanation. The fact which I find surprising is the large effect of this divergence on certain macro-responses. In some cases, it seems that the variation due to changing the skin distance, and hence the neighbour list rebuild frequency, is large enough to outweigh the effect of changing other simulation parameters. Therefore it becomes meaningless to quantify those responses which are known to display strong sensitivity to the skin distance.

Regards,

Kevin

Dear Axel,

Thank you very much for your reply and explanation. The fact which I find
surprising is the large effect of this divergence on certain
macro-responses. In some cases, it seems that the variation due to changing
the skin distance, and hence the neighbour list rebuild frequency, is large
enough to outweigh the effect of changing other simulation parameters.
Therefore it becomes meaningless to quantify those responses which are
known to display strong sensitivity to the skin distance.

no. they are not meaningless at all. and those exponential
changes are just a manifestation of the chaotic nature of MD.
*many* input parameters will cause them. have you
tried changing the time step? or number of processors?

while the trajectories will diverge, they are not "wrong".
there simply is not just one "right" trajectory (or rather
state) but an ensemble of equivalent states that belong
to the same region in phase space. one of the major
problems in MD (or MC) is that you won't get an exhaustive
integration of your phase space (there are far too many
degrees of freedom for that), but rather your trajectory
samples a small subset and you *hope* that it covers a
sufficiently large part of the relevant one and with the
proper weighting. that is not always guaranteed and
not at all easy to prove.

a reproducible trajectory can be as "bad" as one that
is not as reproducible, i.e. have good or bad sampling
of the relevant phase space. that is why people often
create multiple, decorrelated starting conditions, so
that you can get a good statistical sampling. you can
also improve your sampling, by increasing the size of
the sample (but that may be more expensive, particularly
with long-range interactions), or by running a system
in equilibrium for a longer time.

this is all standard statistical mechanics textbook
stuff and should be deeply engraved in the brains
of any student that uses MD as a research tool.
but sadly SM is not always as easily digestible as
other topics, so many try to avoid it and then get
surprised by discovering "undocumented features"
that are simply a consequence of the nature of the
underlying physics, emphasized by the shortcomings
of numerical integration and floating-point math. :wink:

cheers,
     axel.

Therefore it becomes meaningless to quantify those responses which are known to display strong sensitivity to the skin distance.

I'll say this another way. Unless there is a bug
with the reneighboring (which there could be,
since granular neighbor lists store shear history, and
that is tricky to maintain across reneighborings, but
I don't think there is a bug), then so long as you
reneighbor often enough, the skin distance can
make no difference whatsoever in the correctness
of the simulation. However it will lead to
statistical variation due to divergence of trajectories
from numerical round-off,
as Axel has explained. If the property you are measuring
is sensitive to that, then that's just
the way it is.

Steve