Please post to the list, not just to me.
I don’t know why this is happening, since I can’t reproduce it.
Maybe Axel will have an idea. You are running this on
a single proc, on what machine? Is there anything different
about the machine, e.g. is its endian order for binary data
different than most other machines?
Steve
Please post to the list, not just to me.
I don't know why this is happening, since I can't reproduce it.
neither can i.
Maybe Axel will have an idea. You are running this on
i would suspect a compiler issue. if you can install an alternate GCC
compiler to see if the issue carries over would be helpful to know.
a single proc, on what machine? Is there anything different
it would be helpful to have the output of
uname -a
cat /etc/issue
gcc -v 2>&1 | tail -1
I think I've now found a way to resolve the problem. It does seem to be a compilation issue, and I've found that changing some of the compiler options fixes the problem.
I've been running on an intel x86_64 machine; output of requested commands:
uname \-a
Linux frontend04 2\.6\.32\-220\.23\.1\.el6\.x86\_64 \#1 SMP Mon Jun 18 09:58:09 CDT 2012 x86\_64 x86\_64 x86\_64 GNU/Linux
cat /etc/issue
Scientific Linux release 6.2 (Carbon)
Kernel \r on an \m
$ gcc -v 2>&1 | tail -1
gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
and I've been compiling using the provided Makefile.openmpi without any modification. I also tried compiling with gcc 4.4.7, and MPICH2 instead of OpenMPI (and combinations of those) and always got the same bug. I tried compiling with Intel c++ compiler and MPICH using Makefile.linux, and this does not reproduce the error; compiling without mpi using Makefile.serial using the same gcc versions also does not reproduce the error.
I noticed that there are several compiler optimisation options in the provided Make.openmpi, and doing some experimentation found that removing the -funroll-loops option resolved the problem. I don't know much about these optimization options, so don't I don't have any idea why this could cause the problem, and whether its a general issue, or specific to my machine set-up. Also this makefile worked fine for earlier versions of lammps (I tried 21Feb2013).
Thanks for your help,
Chris
I think I've now found a way to resolve the problem. It does seem to be a
compilation issue, and I've found that changing some of the compiler options
fixes the problem.
I've been running on an intel x86_64 machine; output of requested commands:
uname \-a
Linux frontend04 2\.6\.32\-220\.23\.1\.el6\.x86\_64 \#1 SMP Mon Jun 18 09:58:09 CDT
2012 x86\_64 x86\_64 x86\_64 GNU/Linux
cat /etc/issue
Scientific Linux release 6.2 (Carbon)
Kernel \r on an \m
$ gcc -v 2>&1 | tail -1
gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
and I've been compiling using the provided Makefile.openmpi without any
modification. I also tried compiling with gcc 4.4.7, and MPICH2 instead of
OpenMPI (and combinations of those) and always got the same bug. I tried
compiling with Intel c++ compiler and MPICH using Makefile.linux, and this
does not reproduce the error; compiling without mpi using Makefile.serial
using the same gcc versions also does not reproduce the error.
yes. this is the kind of issue that i was suspecting is the case and
that is why i was asking for the details. this only affects the gcc
4.4.x that is shipped with RHEL 6.x or CentOS 6.x and when you use -O2
with several additional flags, particularly -funroll-loops.
it seems to miscompile this piece of code:
buf[m] = 0.0; // for valgrind
*((tagint *) &buf[m++]) = image[i];
in the various AtomVec classes. the older version that was working
correctly, didn't have the line with '// for valgrind'.
i suspect we'll be better off to have valgrind complain and compilers
not miscompile this until somebody has the time to re-implement this
piece using a proper union that will remove the need for the
legal-but-somewhat-unusal code in the second line.
axel.