Hello everyone,
I’m encountering an error when running LAMMPS in hybrid MPI + OpenMP mode, specifically when using more than two processors in the z-direction. My command line is:
mpirun -np 8 \
./lmp -var nX 2 -var nY 2 -var nZ 2 -var nNp 4 -var ompTh 5 -in in.pour.toyoura.CDSS
In my LAMMPS input script, I specify:
processors ${nX} ${nY} ${nZ} numa_nodes ${nNp}
package omp ${ompTh} neigh yes
This configuration produces the following error:
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 302042 RUNNING AT HP-Z8
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 302043 RUNNING AT HP-Z8
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 302044 RUNNING AT HP-Z8
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 302045 RUNNING AT HP-Z8
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 4 PID 302046 RUNNING AT HP-Z8
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 5 PID 302047 RUNNING AT HP-Z8
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 6 PID 302048 RUNNING AT HP-Z8
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 7 PID 302049 RUNNING AT HP-Z8
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
However, if I reduce the processors along the z-direction to 1 (e.g., nZ = 1), the simulation runs without any issues. For example:
mpirun -np 4 \
./lmp -var nX 2 -var nY 2 -var nZ 1 -var nNp 4 -var ompTh 12 -in in.pour.toyoura.CDSS
runs successfully.
Below is some system information and additional details: (LAMMPS is built with intelOneAPI)
= System Information
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz
CPU family: 6
Model: 85
Thread(s) per core: 1
Core(s) per socket: 24
Socket(s): 2
Stepping: 7
= NUMA Configuration
numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 7 8 12 13 14 18 19 20
node 0 size: 31893 MB
node 0 free: 24640 MB
node 1 cpus: 4 5 6 9 10 11 15 16 17 21 22 23
node 1 size: 32251 MB
node 1 free: 29506 MB
node 2 cpus: 24 25 26 27 31 32 33 37 38 39 43 44
node 2 size: 32208 MB
node 2 free: 29176 MB
node 3 cpus: 28 29 30 34 35 36 40 41 42 45 46 47
node 3 size: 32248 MB
node 3 free: 28476 MB
node distances:
node 0 1 2 3
0: 10 11 21 21
1: 11 10 21 21
2: 21 21 10 11
3: 21 21 11 10
= LAMMPS Information
$ ./lmp -help
Large-scale Atomic/Molecular Massively Parallel Simulator - 29 Aug 2024 - Update 1
Git info (stable / stable_29Aug2024_update1)
Installed packages:
EXTRA-FIX GRANULAR INTEL MOLECULE OPENMP PYTHON RIGID VTK
I have tried various processors grid configurations (e.g., grid onelevel, grid numa, grid twolevel) and different values for nZ, nNp, and ompTh.
Unfortunately, whenever I set nZ to 2 (or higher), I encounter a “BAD TERMINATION” error (signal 9 or signal 11).
Fail to run
mpirun -np 8 \
./lmp -var nX 2 -var nY 2 -var nZ 2 -var nNp 4 -var ompTh 1 -in in.pour.toyoura.CDSS
mpirun -np 2 \
./lmp -var nX 1 -var nY 1 -var nZ 2 -var nNp 2 -var ompTh 5 -in in.pour.toyoura.CDSS
processors 2 2 2 grid onelevel
processors * * * grid numa
processors * * * grid twolevel 8 2 2 2
processors * * * grid twolevel 4 2 2 1
But this works well
mpirun -np 4 \
./lmp -var nX 2 -var nY 2 -var nZ 1 -var nNp 4 -var ompTh 1 -in in.pour.toyoura.CDSS
mpirun -np 8 \
./lmp -var nX 4 -var nY 2 -var nZ 1 -var nNp 4 -var ompTh 1 -in in.pour.toyoura.CDSS
processors 2 2 1 grid onelevel # this works
Has anyone seen a similar issue or have suggestions on what might cause this error?
Could it be related to memory, domain decomposition, or a NUMA configuration issue? Any guidance or troubleshooting tips would be greatly appreciated.
Thank you very much!