Hello everyone,
The LAMMPS version I’m using is 29 Aug 2024 (and older releases upto a year), on Linux (Ubuntu, Rhel, Rocky linux).
I am using the LAMMPS python module to automatically launch and manage various LAMMPS instances. In the process of initialising these simulations I noticed that some of them crash with segmentation faults. However, in order to keep things moving, I encapsulated each individual LAMMPS instance in a child process for the case where these faults happen.
I didn’t investigate the crashes further and attributed them to quirks of the force field.
However, upon trying to recreate the crashes by directly running the LAMMPS binary never resulted in similar crashes, which made it harder to track down.
Revisiting the stack trace, it seems that the creation of the box and mapping coordinates leads to a memory access issue, but I do not understand the underlying process sufficiently, so here is the stack trace.
[device] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7fffc804ded8)
==== backtrace (tid: 290652) ====
0 /lib/libucs.so.0(ucs_handle_error+0x254) [0x7fffb56b6b94]
1 /lib/libucs.so.0(+0x27d4c) [0x7fffb56b6d4c]
2 /lib/libucs.so.0(+0x27ff8) [0x7fffb56b6ff8]
3 /lib64/libpsm2.so.2(+0x2750c) [0x7fffc07bd50c]
4 /lib64/libpsm2.so.2(+0x258aa) [0x7fffc07bb8aa]
5 /lib64/libpsm2.so.2(+0x1a5b4) [0x7fffc07b05b4]
6 /lib64/libpsm2.so.2(+0x24b47) [0x7fffc07bab47]
7 /lib64/libpsm2.so.2(psm2_mq_ipeek2+0x89) [0x7fffc07b3d89]
8 /openmpi4-gnu12/4.1.4/lib/openmpi/mca_mtl_psm2.so(ompi_mtl_psm2_progress+0x61) [0x7fffb5276791]
9 /openmpi4-gnu12/4.1.4/lib/libopen-pal.so.40(opal_progress+0x2c) [0x7fffcc892f2c]
10 /openmpi4-gnu12/4.1.4/lib/libopen-pal.so.40(ompi_sync_wait_mt+0x10d) [0x7fffcc89962d]
11 /openmpi4-gnu12/4.1.4/lib/libmpi.so.40(ompi_comm_nextcid+0x169) [0x7fffcd24a3e9]
12 /openmpi4-gnu12/4.1.4/lib/libmpi.so.40(ompi_comm_enable+0x49) [0x7fffcd245a79]
13 /openmpi4-gnu12/4.1.4/lib/libmpi.so.40(mca_topo_base_cart_create+0x1cc) [0x7fffcd2f3dbc]
14 /openmpi4-gnu12/4.1.4/lib/libmpi.so.40(MPI_Cart_create+0x222) [0x7fffcd27d532]
15 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS7ProcMap8cart_mapEiPiS1_PA2_iPPS1_+0x57) [0x7fffef8fe1d7]
16 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS4Comm13set_proc_gridEi+0x973) [0x7fffef56bc03]
17 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS9CreateBox7commandEiPPc+0xd68) [0x7fffefa3b3c8]
18 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input15execute_commandEv+0x712) [0x7fffefacdd82]
19 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input4fileEv+0x177) [0x7fffeface557]
20 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input7includeEv+0xee) [0x7fffefacebbe]
21 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input15execute_commandEv+0x7d0) [0x7fffefacde40]
22 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input4fileEv+0x177) [0x7fffeface557]
23 /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input4fileEPKc+0xc2) [0x7fffeface942]
24 /python3.11/site-packages/lammps/liblammps.so(lammps_file+0x23) [0x7fffef3da763]
25 /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7fffd098214e]
26 /lib64/libffi.so.6(ffi_call+0x36f) [0x7fffd0981aff]
27 /python/3.11.2/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xca4d) [0x7fffd0b91a4d]
28 /python/3.11.2/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0x8340) [0x7fffd0b8d340]
29 python3(_PyObject_MakeTpCall+0x6b) [0x50469b]
30 python3(_PyEval_EvalFrameDefault+0x6c8) [0x562808]
31 python3() [0x56136b]
32 python3(_PyEval_EvalFrameDefault+0x3c1d) [0x565d5d]
33 python3() [0x56136b]
34 python3(_PyEval_EvalFrameDefault+0x3c1d) [0x565d5d]
35 python3() [0x56136b]
36 python3(_PyObject_Call_Prepend+0xd3) [0x505543]
37 python3() [0x53d254]
38 python3() [0x53ae43]
39 python3(_PyObject_MakeTpCall+0x6b) [0x50469b]
40 python3(_PyEval_EvalFrameDefault+0x6c8) [0x562808]
41 python3() [0x56136b]
42 python3(PyEval_EvalCode+0x93) [0x5dc913]
43 python3() [0x5f0867]
44 python3() [0x5f07ff]
45 python3() [0x5f0fa2]
46 python3(_PyRun_SimpleFileObject+0x190) [0x5f0ce0]
47 python3(_PyRun_AnyFileObject+0x44) [0x5f0984]
48 python3(Py_RunMain+0x2a4) [0x5f8424]
49 python3(Py_BytesMain+0x27) [0x5f8077]
50 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7ffff709acf3]
51 python3(_start+0x2e) [0x59639e]
=================================
[device] *** Process received signal ***
[device] Signal: Segmentation fault (11)
[device] Signal code: (-6)
[device] Failing at address: 0x45900046f5c
[device] [ 0] /lib64/libpthread.so.0(+0x12ce0)[0x7ffff7bc1ce0]
[device] [ 1] /lib64/libpsm2.so.2(+0x2750c)[0x7fffc07bd50c]
[device] [ 2] /lib64/libpsm2.so.2(+0x258aa)[0x7fffc07bb8aa]
[device] [ 3] /lib64/libpsm2.so.2(+0x1a5b4)[0x7fffc07b05b4]
[device] [ 4] /lib64/libpsm2.so.2(+0x24b47)[0x7fffc07bab47]
[device] [ 5] /lib64/libpsm2.so.2(psm2_mq_ipeek2+0x89)[0x7fffc07b3d89]
[device] [ 6] /openmpi4-gnu12/4.1.4/lib/openmpi/mca_mtl_psm2.so(ompi_mtl_psm2_progress+0x61)[0x7fffb5276791]
[device] [ 7] /openmpi4-gnu12/4.1.4/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fffcc892f2c]
[device] [ 8] /openmpi4-gnu12/4.1.4/lib/libopen-pal.so.40(ompi_sync_wait_mt+0x10d)[0x7fffcc89962d]
[device] [ 9] /openmpi4-gnu12/4.1.4/lib/libmpi.so.40(ompi_comm_nextcid+0x169)[0x7fffcd24a3e9]
[device] [10] /openmpi4-gnu12/4.1.4/lib/libmpi.so.40(ompi_comm_enable+0x49)[0x7fffcd245a79]
[device] [11] /openmpi4-gnu12/4.1.4/lib/libmpi.so.40(mca_topo_base_cart_create+0x1cc)[0x7fffcd2f3dbc]
[device] [12] /openmpi4-gnu12/4.1.4/lib/libmpi.so.40(MPI_Cart_create+0x222)[0x7fffcd27d532]
[device] [13] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS7ProcMap8cart_mapEiPiS1_PA2_iPPS1_+0x57)[0x7fffef8fe1d7]
[device] [14] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS4Comm13set_proc_gridEi+0x973)[0x7fffef56bc03]
[device] [15] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS9CreateBox7commandEiPPc+0xd68)[0x7fffefa3b3c8]
[device] [16] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input15execute_commandEv+0x712)[0x7fffefacdd82]
[device] [17] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input4fileEv+0x177)[0x7fffeface557]
[device] [18] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input7includeEv+0xee)[0x7fffefacebbe]
[device] [19] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input15execute_commandEv+0x7d0)[0x7fffefacde40]
[device] [20] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input4fileEv+0x177)[0x7fffeface557]
[device] [21] /python3.11/site-packages/lammps/liblammps.so(_ZN9LAMMPS_NS5Input4fileEPKc+0xc2)[0x7fffeface942]
[device] [22] /python3.11/site-packages/lammps/liblammps.so(lammps_file+0x23)[0x7fffef3da763]
[device] [23] /lib64/libffi.so.6(ffi_call_unix64+0x4c)[0x7fffd098214e]
[device] [24] /lib64/libffi.so.6(ffi_call+0x36f)[0x7fffd0981aff]
[device] [25] /python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xca4d)[0x7fffd0b91a4d]
[device] [26] /python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0x8340)[0x7fffd0b8d340]
[device] [27] python3(_PyObject_MakeTpCall+0x6b)[0x50469b]
[device] [28] python3(_PyEval_EvalFrameDefault+0x6c8)[0x562808]
[device] [29] python3[0x56136b]
[device] *** End of error message ***
While the problem seemed disregardable in its outset, it came to a head when every instance of LAMMPS that I launched on a different HPC platform crashed with a similar stack trace. I will appreciate any insight on this observation.
Cheers