possible bug with pair style of hybrid

yijin_mao1 · April 22, 2016, 12:09am

Dear LAMMPS Developers:

I have a possible bug related to the pair style of hybrid to report.

To be short, the simulation results will be different if simply swich the order to sub-styles given by pair_style hybrid.

For your convenience, two almost same input files(only the line with this pair_style hybrid differ) are prepared as attached. You can easily reproduce the problem by following the brief instruction in README.txt.

How to reproduce this bug:

step 1. go to test1 run ‘Run.sh’ if you have ‘lmp_mpi’ installed (LAMMPS (16 Feb 2016))
step 2. go to test2 run ‘Run.sh’ if you have ‘lmp_mpi’ installed (LAMMPS (16 Feb 2016))

NOTE: the ONLY difference between test1/run.in and test2/run.in is from the 5th line, where the order of the sub_styles are switched.
test1 has “pair_style hybrid lj/cut/coul/long 13.0 lj/cut 13.0”, lj/cut/coul/long appear first
test2 has “pair_style hybrid lj/cut 13.0 lj/cut/coul/long 13.0”, lj/cut appear first

step 3. plot the second column (temp) against the step by file test1/log.screen and test2/log.screen, you will have a figure as given in this folder.

Actually, It was expected that there would be no difference between these two cases, however, after 10,000 steps, visiable difference occur.

This problem can be reproduced by lammps version of ‘LAMMPS (16 Feb 2016)’.

Thank you for your attention!

Best.,

BugTest.tar.gz (118 KB)

akohlmey · April 22, 2016, 2:09am

Dear LAMMPS Developers:

I have a possible bug related to the pair style of hybrid to report.

no, this is not a bug. actually, it is a very well known "feature".

To be short, the simulation results will be different if simply swich the
order to sub-styles given by pair_style hybrid.
For your convenience, two almost same input files(only the line with this
pair_style hybrid differ) are prepared as attached. You can easily
reproduce the problem by following the brief instruction in README.txt.

How to reproduce this bug:

    step 1. go to test1 run 'Run.sh' if you have 'lmp_mpi' installed
(LAMMPS (16 Feb 2016))
    step 2. go to test2 run 'Run.sh' if you have 'lmp_mpi' installed
(LAMMPS (16 Feb 2016))

    NOTE: the ONLY difference between test1/run.in and test2/run.in is
from the 5th line, where the order of the sub_styles are switched.
    test1 has "pair_style hybrid lj/cut/coul/long 13.0 lj/cut 13.0",
lj/cut/coul/long appear first
    test2 has "pair_style hybrid lj/cut 13.0 lj/cut/coul/long 13.0",
lj/cut appear first

    step 3. plot the second column (temp) against the step by file
test1/log.screen and test2/log.screen, you will have a figure as given in
this folder.

Actually, It was expected that there would be no difference between these
two cases, however, after 10,000 steps, visiable difference occur.

this is not correct. you are ignoring the fact, that LAMMPS uses floating
point math and floating point math has two particular properties:

1) the resolution depends on the magnitude (there are twice as many
numbers between 1.0 and 2.0, than between 2.0 and 3.0)
2) floating does not commute, which means that results depend on the
order of execution. by swapping the two substyles in the pair_style
statement, you change the order in which forces are accumulated and thus
you will get slightly different totals, since the magnitude of numbers
differs.

the result of this is that trajectories will at some point begin to
diverge, and since MD is a chaotic system, the trajectories will diverge
exponentially.
the same thing will happen, when you run on a different number of
processors, for example. or whether you change the frequency of neighbor
list updates. or whether you turn atom sorting on or of.

This problem can be reproduced by lammps version of 'LAMMPS (16 Feb 2016)'.

this will happen with *any* version of LAMMPS (and similarly with any MD
code that uses floating point math).

THERE ARE TWO EXCELLENT BLOG POSTS ON THIS TOPIC THAT I RECOMMEND
EVERYBODY ON THIS LIST TO READ:

http://blog.reverberate.org/2014/09/what-every-computer-programmer-should.html

http://blog.reverberate.org/2016/02/06/floating-point-demystified-part2.html

axel.

akohlmey · April 22, 2016, 2:41am

correction. floating point math does commute, (i.e. A+B == B+A) but the fact that the result of sums depends on the order of summation is because it is not associative (i.e. (A+B)+C != A+(B+C)). i always mix those two up terms up.