Centos8 break

Has anyone else had trouble with LAMMPS on Centos8? It appears that LAMMPS may have been broken by one of the recent updates to centos8 or its repositories.

I was installing some new software on our centos8 workstation and as part of the process I ran a “$ yum update” command for maintenance purposes. Several days later another user on the workstation reported to me that they were getting “segmentation 11” faults running LIGGGHTS scripts they had run previously without error.

I attempted to run example scripts getting the same error. Then I downloaded a fresh clone of LIGGGHTS and recompiled with no improvement. Finally, tonight, I reinstalled Centos8 from scratch and installed LIGGGHTS and continued getting the same error.

At this point I decided to test LAMMPS to see if the failure was deeper than LIGGGHTS and found I am getting the same segmentation fault.

I am running a clean most recent stable LAMMPS distro. I compiled with make mpi, and attempted to run the granular example, in.pour.drum, with a single processor.
Here is my output:

$ mpirun -np 1 lmp_mpi in in.pour.drum
[mfg-67:74816:0:74816] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f7211277768)
==== backtrace ====
0 /lib64/libucs.so.0(+0x18bb0) [0x7f7210c0abb0]
1 /lib64/libucs.so.0(+0x18d8a) [0x7f7210c0ad8a]
2 /lib64/libuct.so.0(+0x1655b) [0x7f721233e55b]
3 /lib64/ld-linux-x86-64.so.2(+0xfd0a) [0x7f72202c3d0a]
4 /lib64/ld-linux-x86-64.so.2(+0xfe0a) [0x7f72202c3e0a]
5 /lib64/ld-linux-x86-64.so.2(+0x13def) [0x7f72202c7def]
6 /lib64/libc.so.6(_dl_catch_exception+0x77) [0x7f721ef93ab7]
7 /lib64/ld-linux-x86-64.so.2(+0x1365e) [0x7f72202c765e]
8 /lib64/libdl.so.2(+0x11ba) [0x7f721e6ed1ba]
9 /lib64/libc.so.6(_dl_catch_exception+0x77) [0x7f721ef93ab7]
10 /lib64/libc.so.6(_dl_catch_error+0x33) [0x7f721ef93b53]
11 /lib64/libdl.so.2(+0x1939) [0x7f721e6ed939]
12 /lib64/libdl.so.2(dlopen+0x4a) [0x7f721e6ed25a]
13 /usr/lib64/openmpi/lib/libopen-pal.so.40(+0x6df05) [0x7f721e95df05]
14 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_repository_open+0x206) [0x7f721e93bb16]
15 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_find+0x35a) [0x7f721e93aa5a]
16 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_components_register+0x2e) [0x7f721e9463ce]
17 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_register+0x252) [0x7f721e9468b2]
18 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_open+0x15) [0x7f721e946915]
19 /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x674) [0x7f721fdbe494]
20 /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Init+0x72) [0x7f721fdee6b2]
21 lmp_mpi() [0x40721b]
22 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f721ee7e873]
23 lmp_mpi() [0x4072ae]

Looking at your stack trace below, it doesn’t look like it is LAMMPS that has been broken, but rather OpenMPI. The segfault is happening during the application startup before the main() function of LAMMPS is entered and rather during setup of the OpenMPI internal initialization and loading of plugins.

To confirm, you could do the following:

  • compile LAMMPS with “make serial” and see if the issue persists.
  • compile a “hello world” style MPI program with OpenMPI and see if the same issue happens

My expectation is that the first will run, but the second won’t and then you could check if there is already a bug report and/or workaround for it.
If you need parallel processing, it might be worth trying to compile LAMMPS (and LIGGGHTS) with MPICH instead of OpenMPI.

Axel.

Haha, that explains why the problem is in lammps and liggghts. My inexperience appears to be showing, though compiling a serial version was the next thing I was going to try.

You are correct, MPI hello world does not work, while serial lammps does. I guess I just assumed that Centos8 would be stable, didn’t look deeply enough at what was going and defaulted to looking at liggghts and lammps.

Thank you,