Compiling LAMMPS with OpenMPI

Hello!

I'm trying to compile LAMMPS with OpenMPI-1.4.2, however I have not
been able to get a working executable, the result is always that it
fails with a segfault. The executable compiled with Makefile.serial
works as intended.

Makefile and output is below. There is something that I'm not seeing
here, but I don't knwo what is is. Any help would be appreciated.

Thomas

[email protected]... ~/lammps-9Dec11/src $ cat MAKE/Makefile.michelin
...
CC = mpiCC
CCFLAGS = -O
DEPFLAGS = -M
LINK = mpiCC
LINKFLAGS = -O
...
MPI_INC =
MPI_PATH =
MPI_LIB =
LIB =
...

[email protected]... ~ $ mpirun -np 2 lmp_serial -help
LAMMPS (9 Dec 2011)

List of style options included in this executable:
...

[email protected]... ~ $ mpirun -np 1 lmp_michelin -help
[michelin:26163] *** Process received signal ***
[michelin:26163] Signal: Segmentation fault (11)
[michelin:26163] Signal code: Invalid permissions (2)
[michelin:26163] Failing at address: 0x7f327713b385
[michelin:26163] [ 0] /lib64/libpthread.so.0(+0x10310) [0x7f3276ba9310]
[michelin:26163] [ 1]
lmp_michelin(_ZN9LAMMPS_NS8Universe9add_worldEPc+0x138) [0x547054]
[michelin:26163] [ 2]
lmp_michelin(_ZN9LAMMPS_NS6LAMMPSC1EiPPcP19ompi_communicator_t+0x5db)
[0x5f6a41]
[michelin:26163] [ 3] lmp_michelin(main+0x3d) [0x5e6845]
[michelin:26163] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f327511de9d]
[michelin:26163] [ 5] lmp_michelin() [0x473219]
[michelin:26163] *** End of error message ***

Hello!

I'm trying to compile LAMMPS with OpenMPI-1.4.2, however I have not
been able to get a working executable, the result is always that it
fails with a segfault. The executable compiled with Makefile.serial
works as intended.

can you compile and run other MPI programs
with the same OpenMPI installation?

the error looks as if it originates from your MPI installation.

axel.

Hello!

can you compile and run other MPI programs
with the same OpenMPI installation?

The charm test suite that comes with NAMD 2.8 finishes without complaints.

[email protected]... ~/NAMD_2.8_Source/charm-6.3.2/tests/charm++/megatest $
mpirun -np 20 -H node1,node2,node3,node4,node5
~/NAMD_2.8_Source/charm-6.3.2/tests/charm++/megatest/pgm
...
All tests completed, exiting
End of program

the error looks as if it originates from your MPI installation.

I had considered that, but how do I tell?

Thomas

Hello!

can you compile and run other MPI programs
with the same OpenMPI installation?

The charm test suite that comes with NAMD 2.8 finishes without complaints.

[email protected]... ~/NAMD_2.8_Source/charm-6.3.2/tests/charm++/megatest $
mpirun -np 20 -H node1,node2,node3,node4,node5
~/NAMD_2.8_Source/charm-6.3.2/tests/charm++/megatest/pgm
...
All tests completed, exiting
End of program

the error looks as if it originates from your MPI installation.

I had considered that, but how do I tell?

the next best thing that you can try is to compile
with debug info (add -g to CCFLAGS and LINKFLAGS)
and run the executable through a debugger:

mpirun -np 1 gdb lmp_michelin

...

run -help

and then inspect the stack frame and
see which call exactly crashes where.

cheers,
    axel.

Hello!

Hello!

I'm trying to compile LAMMPS with OpenMPI-1.4.2, however I have not
been able to get a working executable, the result is always that it
fails with a segfault.

the error looks as if it originates from your MPI installation.

This is true, in a way. The executable was compiled with option
LMP_INC = -DLAMMPS_MEMALIGN, and openmpi knows about malloc(), but not
about posix_memalign().

Ouch. Without that option, MPI runs go perfectly fine. Problem solved.

sorry, but this doesn't make much sense.

what is the relation between MPI and malloc()/posix_memalign()?

i've been using posix_memalign() for ages *with* OpenMPI for years
now and on different platforms and with different versions of OpenMPI.

there must be something else that was miscompiled
and got cleaned out when you change the make file settings.

axel.

Hello!

This is true, in a way. The executable was compiled with option
LMP_INC = -DLAMMPS_MEMALIGN, and openmpi knows about malloc(), but not
about posix_memalign().

sorry, but this doesn't make much sense.

what is the relation between MPI and malloc()/posix_memalign()?

I was under the impression that OpenMPI intercepts calls to malloc(),
but not to posix_memalign()?

Whatever the cause for this bug is, it is reproducible. Segfault for
-DLAMMPS_MEMALIGN, no segfault without that definition.

Thomas

Hello!

This is true, in a way. The executable was compiled with option
LMP_INC = -DLAMMPS_MEMALIGN, and openmpi knows about malloc(), but not
about posix_memalign().

sorry, but this doesn't make much sense.

what is the relation between MPI and malloc()/posix_memalign()?

I was under the impression that OpenMPI intercepts calls to malloc(),
but not to posix_memalign()?

that was changed many years ago. all openmpi does is
replace the malloc with its own for allocation of pinned
memory, so it is not returned to the kernel for performance
reasons. OpenMPI or any other MPI library has no job to
interfere with memory allocations in an incompatible way.

Whatever the cause for this bug is, it is reproducible. Segfault for

it certainly is not a bug in LAMMPS.
more likely is a problem with the setup
of your cluster.

axel.