I don’t know why this is happening, since I can’t reproduce it.
Maybe Axel will have an idea. You are running this on
a single proc, on what machine? Is there anything different
about the machine, e.g. is its endian order for binary data
different than most other machines?
I think I've now found a way to resolve the problem. It does seem to be a compilation issue, and I've found that changing some of the compiler options fixes the problem.
I've been running on an intel x86_64 machine; output of requested commands: uname \-a
Linux frontend04 2\.6\.32\-220\.23\.1\.el6\.x86\_64 \#1 SMP Mon Jun 18 09:58:09 CDT 2012 x86\_64 x86\_64 x86\_64 GNU/Linux
cat /etc/issue
Scientific Linux release 6.2 (Carbon)
Kernel \r on an \m
$ gcc -v 2>&1 | tail -1
gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
and I've been compiling using the provided Makefile.openmpi without any modification. I also tried compiling with gcc 4.4.7, and MPICH2 instead of OpenMPI (and combinations of those) and always got the same bug. I tried compiling with Intel c++ compiler and MPICH using Makefile.linux, and this does not reproduce the error; compiling without mpi using Makefile.serial using the same gcc versions also does not reproduce the error.
I noticed that there are several compiler optimisation options in the provided Make.openmpi, and doing some experimentation found that removing the -funroll-loops option resolved the problem. I don't know much about these optimization options, so don't I don't have any idea why this could cause the problem, and whether its a general issue, or specific to my machine set-up. Also this makefile worked fine for earlier versions of lammps (I tried 21Feb2013).
I think I've now found a way to resolve the problem. It does seem to be a
compilation issue, and I've found that changing some of the compiler options
fixes the problem.
I've been running on an intel x86_64 machine; output of requested commands: uname \-a
Linux frontend04 2\.6\.32\-220\.23\.1\.el6\.x86\_64 \#1 SMP Mon Jun 18 09:58:09 CDT
2012 x86\_64 x86\_64 x86\_64 GNU/Linux
cat /etc/issue
Scientific Linux release 6.2 (Carbon)
Kernel \r on an \m
$ gcc -v 2>&1 | tail -1
gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
and I've been compiling using the provided Makefile.openmpi without any
modification. I also tried compiling with gcc 4.4.7, and MPICH2 instead of
OpenMPI (and combinations of those) and always got the same bug. I tried
compiling with Intel c++ compiler and MPICH using Makefile.linux, and this
does not reproduce the error; compiling without mpi using Makefile.serial
using the same gcc versions also does not reproduce the error.
yes. this is the kind of issue that i was suspecting is the case and
that is why i was asking for the details. this only affects the gcc
4.4.x that is shipped with RHEL 6.x or CentOS 6.x and when you use -O2
with several additional flags, particularly -funroll-loops.
in the various AtomVec classes. the older version that was working
correctly, didn't have the line with '// for valgrind'.
i suspect we'll be better off to have valgrind complain and compilers
not miscompile this until somebody has the time to re-implement this
piece using a proper union that will remove the need for the
legal-but-somewhat-unusal code in the second line.