nan in output using airebo for graphene

I am trying to compute the forces on carbon atoms in a certain
configuration (the configuration corresponds to a symmetric tilt grain
boundary, however, this should have no significance). I have checked
out the exact same version of LAMMPS from the svn repository on two
different machines. On one machine I get reasonable answers, while on
the other I get nan. I have tried checking out the latest version of
LAMMPS and the problem persists. The machine on which things work
yields
$ uname -a
Linux mc1 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:25:36 UTC 2011
i686 i686 i386 GNU/Linux

while the machine where things don't work yields:
$ uname -a
Linux mc2 2.6.18-274.7.1.el5 #1 SMP Thu Oct 20 15:20:36 EDT 2011
x86_64 x86_64 x86_64 GNU/Linux

I run LAMMPS by invoking the following:
lmp_serial < input

NOTE: The relevant files are temporarily available at:
http://dhruv.ccmr.cornell.edu/ashivni/

The input script is also appended (below are the contents of file called input):

clear
variable dump_file string "./trj"
variable data_file string "./data"
units metal
boundary s p p
atom_modify sort 0 0.0
neighbor 2.0 bin

read_data ./data

### interactions
pair_style airebo 2.0
pair_coeff * * CH.airebo C
mass * 12.0

### run
fix fix_nve all nve
dump dump_all all custom 1 ./trj id type x y z vx vy vz fx fy fz
thermo_style custom step temp press cpu pxx pyy pzz pxy pxz pyz ke pe
etotal vol lx ly lz atoms
thermo_modify flush yes
thermo 1
run 0
log /dev/stdout

The contents of the file named data are appended below:
./data

51 atoms
1 atom types
0.0 11.174734400 xlo xhi
0.0 16.935806774 ylo yhi
0.0 10.000000000 zlo zhi

Atoms

     1 1 3.467160896 0.259221532 5.000000000
     2 1 2.569191168 2.505808145 5.000000000
     3 1 3.666709725 1.641736371 5.000000000
     4 1 2.768739996 3.888322984 5.000000000
     5 1 4.065807382 4.406766048 5.000000000
     6 1 4.265356211 5.789280887 5.000000000
     7 1 1.671221439 4.752394758 5.000000000
     8 1 1.870770268 6.134909597 5.000000000
     9 1 3.167837654 6.653352661 5.000000000
    10 1 2.269867925 8.899939274 5.000000000
    11 1 3.367386482 8.035867500 5.000000000
    12 1 2.469416754 10.282454113 5.000000000
    13 1 4.664453868 8.554310564 5.000000000
    14 1 3.766484139 10.800897177 5.000000000
    15 1 4.864002696 9.936825403 5.000000000
    16 1 3.966032968 12.183412016 5.000000000
    17 1 1.571447025 12.529040726 5.000000000
    18 1 2.868514411 13.047483790 5.000000000
    19 1 1.970544682 15.294070403 5.000000000
    20 1 3.068063239 14.429998629 5.000000000
    21 1 2.170093511 16.676585242 5.000000000
    22 1 4.365130625 14.948441694 5.000000000
    23 1 4.564679454 16.330956532 5.000000000
    24 1 7.707573504 0.259221532 5.000000000
    25 1 8.605543232 2.505808145 5.000000000
    26 1 7.508024675 1.641736371 5.000000000
    27 1 8.405994404 3.888322984 5.000000000
    28 1 5.587367200 2.160179435 5.000000000
    29 1 7.108927018 4.406766048 5.000000000
    30 1 5.587367200 3.542694274 5.000000000
    31 1 6.909378189 5.789280887 5.000000000
    32 1 5.587367200 6.307723952 5.000000000
    33 1 9.503512961 4.752394758 5.000000000
    34 1 9.303964132 6.134909597 5.000000000
    35 1 8.006896746 6.653352661 5.000000000
    36 1 8.904866475 8.899939274 5.000000000
    37 1 7.807347918 8.035867500 5.000000000
    38 1 8.705317646 10.282454113 5.000000000
    39 1 6.510280532 8.554310564 5.000000000
    40 1 7.408250261 10.800897177 5.000000000
    41 1 6.310731704 9.936825403 5.000000000
    42 1 7.208701432 12.183412016 5.000000000
    43 1 5.587367200 12.701855081 5.000000000
    44 1 5.587367200 14.084369919 5.000000000
    45 1 9.603287375 12.529040726 5.000000000
    46 1 8.306219989 13.047483790 5.000000000
    47 1 9.204189718 15.294070403 5.000000000
    48 1 8.106671161 14.429998629 5.000000000
    49 1 9.004640889 16.676585242 5.000000000
    50 1 6.809603775 14.948441694 5.000000000
    51 1 6.610054946 16.330956532 5.000000000

lmp_serial -h tells me the following on both machines:
LAMMPS (19 Dec 2011)

List of style options included in this executable:

Atom styles: atomic charge ellipsoid hybrid line sphere tri

Integrate styles: respa verlet

Minimize styles: cg fire hftn quickmin sd

Pair styles: adp airebo born/coul/wolf born buck/coul/cut buck comb
coul/cut coul/debye coul/wolf dpd dpd/tstat eam/alloy eam/fs eam eim
gauss hybrid hybrid/overlay lj96/cut lj/class2/coul/cut
lj/class2/coul/long lj/class2 lj/cubic lj/cut/coul/cut
lj/cut/coul/debye lj/cut lj/expand lj/gromacs/coul/gromacs lj/gromacs
lj/smooth morse rebo soft sw table tersoff tersoff/zbl yukawa

Bond styles: class2 hybrid

Angle styles: class2

Dihedral styles: class2

Improper styles: class2

KSpace styles:

Fix styles (upper case are only for internal use): adapt addforce
ave/atom ave/correlate aveforce ave/histo ave/spatial ave/time
box/relax deform deposit drag dt/reset efield enforce2d evaporate
external gravity heat indent langevin lineforce MINIMIZE momentum move
nph nph/sphere npt npt/sphere nve nve/limit nve/noforce nve/sphere nvt
nvt/sllod nvt/sphere orient/fcc planeforce press/berendsen print
qeq/comb READ_RESTART recenter RESPA restrain rigid rigid/nve
rigid/nvt setforce shake SHEAR_HISTORY spring spring/rg spring/self
store/force store/state temp/berendsen temp/rescale
thermal/conductivity tmd ttm viscosity viscous wall/harmonic
wall/lj126 wall/lj93 wall/reflect wall/region

Compute styles: angle/local atom/molecule bond/local centro/atom
cluster/atom cna/atom com com/molecule coord/atom dihedral/local
displace/atom erotate/sphere group/group gyration gyration/molecule
heat/flux improper/local ke/atom ke msd msd/molecule pair pair/local
pe/atom pe pressure property/atom property/local property/molecule rdf
reduce reduce/region slice stress/atom temp/com temp/deform temp
temp/partial temp/profile temp/ramp temp/region temp/sphere ti

Region styles: block cone cylinder intersect plane prism sphere union

Dump styles: atom cfg custom dcd image local xyz

Command styles (add-on input script commands): change_box create_atoms
create_box delete_atoms delete_bonds displace_atoms displace_box
minimize read_data read_restart replicate run set velocity
write_restart

Any help will be appreciated!!

I am trying to compute the forces on carbon atoms in a certain
configuration (the configuration corresponds to a symmetric tilt grain
boundary, however, this should have no significance). I have checked
out the exact same version of LAMMPS from the svn repository on two
different machines. On one machine I get reasonable answers, while on
the other I get nan. I have tried checking out the latest version of
LAMMPS and the problem persists. The machine on which things work
yields
$ uname -a
Linux mc1 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:25:36 UTC 2011
i686 i686 i386 GNU/Linux

while the machine where things don't work yields:
$ uname -a
Linux mc2 2.6.18-274.7.1.el5 #1 SMP Thu Oct 20 15:20:36 EDT 2011
x86_64 x86_64 x86_64 GNU/Linux

one observation. one machine is apparently running red hat enterprise
linux 5 (or a clone of it) and that has a fairly old compiler, while the other
machine seems to be running new-ish ubuntu installation. the latter
has much newer compilers. it is quite possible, that you are running into
a compiler optimization issue. the surprising thing is that it breaks on
the older machine, which uses a quite common software stack.

i would suggest to reduce the optimization level in the makefile
and check if the problem persists. if not, it could just be that your
system is marginal and works on one machine by chance.

axel.

I think it is a bug in the 64 bit LAMMPS. I tried compiling LAMMPS on
a newer 64 bit machine running ubuntu 11.10 and got the same result.

However, I was able to fix things by compiling with the -m32 flag for
g++. This generates a 32 bit executable, and I do not get nan anymore!

--Ashivni.

I think it is a bug in the 64 bit LAMMPS. I tried compiling LAMMPS on
a newer 64 bit machine running ubuntu 11.10 and got the same result.

However, I was able to fix things by compiling with the -m32 flag for
g++. This generates a 32 bit executable, and I do not get nan anymore!

no. i don't think it fixes it, it just confirms my suspicion.
i also don't think that this is a bug based on this kind of
observation. please produce a proper test case.

there is not much 64-bit specific code that is likely
to have the kind of effect that you describe, however
on different architectures and with different number
of general purpose registers, you are likely to get
slightly different results when summing up, forces
or computing splines and that can take a marginal
input easily from working to non-working and back.

axel.

Alex,
   You are right in that I am not providing a very informative bug
report; perhaps I should. However, I am sure that things get fixed
when I use the 32 bit binary because I can see the output, and I get
reasonable answers, which are consistent with the results of the same
calculation on other machines.
    I do not know how to file a bug report. I have already provided
the input files, the version of LAMMPS that I am using and the version
of my OS. I will be glad to supply any more information. Please let me
know.

--Ashivni.

Alex,
You are right in that I am not providing a very informative bug
report; perhaps I should. However, I am sure that things get fixed
when I use the 32 bit binary because I can see the output, and I get
reasonable answers, which are consistent with the results of the same
calculation on other machines.
I do not know how to file a bug report. I have already provided
the input files, the version of LAMMPS that I am using and the version
of my OS. I will be glad to supply any more information. Please let me
know.

that'll do. i can have a look at it.

i have already forgotten that you made that information
available. the memory of people that answer a lot of
emails suffers from frequent cache re-fills, i.e. they
don't always recall the entire content of a conversation,
but only the last one or two exchanges.

axel.

Alex,
You are right in that I am not providing a very informative bug
report; perhaps I should. However, I am sure that things get fixed
when I use the 32 bit binary because I can see the output, and I get

it is not a 64-bit issue, because i see the NaNs show up and
disappear with different (64-bit) compiler settings. that narrows
down the likely sources of problems to:
- compiler optimization (unlikely, since it shows up with *lower* optimization)
- use of uninitialized data somewhere
- out of bounds access of data and thus data corruption
  on the stack. with different optimization (and bitness),
  the stack is organized differently.

time to run the code through valgrind and see if something pops up.
keep your fingers crossed.

cheers,
    axel.

it either due to use of an uninitialized vaiable

ok,

i think i have nailed this one down.
please try out the attached modified sources.

the problem is that some of the checks for
flat structures in PairAIREBO::bondorderLJ()
only check for exact discontinuity without
taking into account rounding issues and that
limited accuracy in floating point operations
can lead to division by zero even for coordinates
that are not perfectly flat. if and when this
scenario happens is affected by compiler
optimizations (which in turn depend on the
platform). you can check out the issue, for
example, by changing the rightmost digit of
the x coordinate of atom 30 (which is the
major offender in this case) from 0 to 1.
with this tiny change that has no significant
impact on forces and energy, the NaNs
are gone. please see below for the core
of the applied changes.

cheers,
    axel.

diff --git a/src/MANYBODY/pair_airebo.cpp b/src/MANYBODY/pair_airebo.cpp
index fcbbc1e..f93ce65 100644
--- a/src/MANYBODY/pair_airebo.cpp
+++ b/src/MANYBODY/pair_airebo.cpp
@@ -2192,7 +2194,7 @@ double PairAIREBO::bondorderLJ(int i, int j,
double rij[3], double rijmag,
   costmp = 0.5*(rij2+rik2-rjk2)/rijmag/rikmag;
   tspjik = Sp2(costmp,thmin,thmax,dtsjik);

- if (sqrt(1.0 - cos321*cos321) != 0.0) {
+ if (sqrt(1.0 - cos321*cos321) > sqrt(TOL)) {
     wik = Sp(rikmag,rcmin[itype][ktype],rcmaxp[itype][ktype],dwik);
     REBO_neighs_j = REBO_firstneigh[j];
     for (l = 0; l < REBO_numneigh[j]; l++) {
@@ -2217,7 +2219,7 @@ double PairAIREBO::bondorderLJ(int i, int j,
double rij[3], double rijmag,
         costmp = 0.5*(rij2+rjl2-ril2)/rijmag/rjlmag;
         tspijl = Sp2(costmp,thmin,thmax,dtsijl);

- if (sqrt(1.0 - cos234*cos234) != 0.0) {
+ if (sqrt(1.0 - cos234*cos234) > sqrt(TOL)) {
     wjl = Sp(rjlmag,rcmin[jtype][ltype],rcmaxp[jtype][ltype],dS);
     crosskij[0] = (rij[1]*rik[2]-rij[2]*rik[1]);
     crosskij[1] = (rij[2]*rik[0]-rij[0]*rik[2]);

pair_airebo_omp.cpp.gz (15.1 KB)

pair_airebo.cpp.gz (21.7 KB)