What are the best practices to find where a segmentation fault might come from?

Caro · July 5, 2025, 11:14pm

Hi everyone,

I am wondering what are (in your experience) best practices to debug where a ReaxFF MD segmentation fault might come from?
What check list would you follow to find the exact setup/parameter or any other factor that is causing the segmentation fault?

Here is a code that generates a segmentation fault error at around a few 1000 steps running on Ubuntu 22, 128 cores, 100 G mem, takes a few short minutes.

Thank you in advance.
CHONSSiNaPX.reax (26.7 KB)
HPO4_x30 (7.2 KB)
in_1slab.input (2.0 KB)
substrate_1slab2.lmpdat (97.4 KB)

akohlmey · July 5, 2025, 11:18pm

When in doubt, it helps to first check the LAMMPS manual. 11.4. Debugging crashes — LAMMPS documentation

That requires far too many resources for meaningful debugging.
I do my best debugging, when I can reach the crash with 8 cores within less than a minute.

akohlmey · July 6, 2025, 12:36pm

I just happened to run across a few server with 64 cores and the input ran to completion:

LAMMPS (12 Jun 2025 - Development - patch_12Jun2025-533-gcdded2d51c)
KOKKOS mode with Kokkos version 4.6.1 is enabled
  using 1 OpenMP thread(s) per MPI task
package kokkos
# Created by charmm2lammps v1.8.1 on Fri Jan 28 00:27:31 EST 2022

[...]

Loop time of 552.961 on 64 procs for 100000 steps with 2260 atoms

Performance: 0.781 ns/day, 30.720 hours/ns, 180.845 timesteps/s, 408.709 katom-step/s
91.0% CPU use with 64 MPI tasks x 1 OpenMP threads

This is the current LAMMPS development version using the KOKKOS implementation of ReaxFF on the CPU and without threads.

mpirun -np 64 env OMP_NUM_THREADS=1 OMP_PROC_BIND=false ./lmp -in in_1slab.input -k on t 1 -sf kk

Caro · July 6, 2025, 5:19pm

Thank you for your support.

I’ll give that a try to see if I can reproduce with my own HPC.
So do you recommend just to use the KOKKOS package for my ReaxFF?

akohlmey · July 6, 2025, 5:25pm

The KOKKOS package version has a more robust memory management (required by the port to KOKKOS) and should be a little bit faster, too, since it has been optimized more.