[lammps-users] LAMMPS openmpi with valgrind?

Hi. I am trying to add some tricky new code to LAMMPS, and my code likely has some bugs at the moment. On systems with OpenMPI (my desktop, but also spirit and others) valgrind lists errors even for the stock LAMMPS code, running stock examples. For example, on spirit, running the example “melt” (with the stock 6/26/08 version of LAMMPS) gives 14 errors on 2 processors. Errors like:

==8430== Syscall param writev(vector[…]) points to uninitialised byte(s)
==8430== at 0x558FC57: writev (in /lib64/tls/libc-2.3.4.so)
==8430== by 0x652B85D: mca_oob_tcp_msg_send_handler (in /projects/nwcc/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/lib/openmpi/mca_oob_tcp.so)
==8430== Address 0x423D6A1 is not stack’d, malloc’d or (recently) free’d

Now it seems (from Googling) that this is a common problem with valgrind and OpenMPI, i. e. it doesn’t necessarily have anything to do with LAMMPS. But it sure does make debugging tougher. Has anyone dealt with this problem before, and if so did you find a way around it?

Thanks,
Rob

Hi. I am trying to add some tricky new code to LAMMPS, and my code likely
has some bugs at the moment. On systems with OpenMPI (my desktop, but also
spirit and others) valgrind lists errors even for the stock LAMMPS code,

how about running with the MPI stubs library first?

running stock examples. For example, on spirit, running the example "melt"
(with the stock 6/26/08 version of LAMMPS) gives 14 errors on 2 processors.
Errors like:
==8430== Syscall param writev(vector[...]) points to uninitialised byte(s)
==8430== at 0x558FC57: writev (in /lib64/tls/libc-2.3.4.so)
==8430== by 0x652B85D: mca_oob_tcp_msg_send_handler (in
/projects/nwcc/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/lib/openmpi/mca_oob_tcp.so)
==8430== Address 0x423D6A1 is not stack'd, malloc'd or (recently) free'd

Now it seems (from Googling) that this is a common problem with valgrind and
OpenMPI, i. e. it doesn't necessarily have anything to do with LAMMPS. But

your compile is probably using the bundled ptmalloc that keeps track of
allocations and can re-use memory blocks without having to malloc them
again. you could try compiling a version without that. for general
debugging, you usually don't need to run at full speed and with all
features for most of the time...

it sure does make debugging tougher. Has anyone dealt with this problem
before, and if so did you find a way around it?

nope. i always write bug-free code. :wink:

cheers,
   axel.

OpenMPI does some magic around memory allocation that causes Valgrind to get angry. This is much worse when running on RDMA type systems. (infiniband etc).

I would post this to the OpenMPI mailing list. I read it and they recently added some work that makes OpenMPI much more valgrind friendly.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[email protected]...
(734)936-1985