Segmentation fault when running LAMMPS with NequIP and KOKKOS (GPU)

I am encountering a persistent segmentation fault when trying to run an MD simulation using LAMMPS with the NequIP potential and the KOKKOS package for GPU acceleration.

Problem Description:
The simulation runs without any issues on CPUs (without the KOKKOS package). However, when I enable KOKKOS to use a single NVIDIA GPU, it crashes immediately at the first run command with a Segmentation fault (core dumped).

System & Software Environment:

  • OS: [Your OS, e.g., Ubuntu 22.04]
  • Compiler: GCC [Your version, e.g., 11.4]
  • CUDA: 12.4
  • MPI: [Your MPI, e.g., OpenMPI 4.1.5]
  • GPU: [Your GPU model, e.g., NVIDIA A100 32GB]
  • LAMMPS Version: 11 Feb 2026
  • NequIP Plugin: [Version/Source, e.g., built from the official GitHub repository]

Steps to Reproduce:
I have attached a minimal input script (in.equilibrate) that replicates the issue. The key steps are:

  1. Read initial data file for BaTiO3.
  2. Replicate the system to 3645 atoms.
  3. Set pair_style allegro.
  4. Run NVT equilibration.

My Build Configuration:
I built LAMMPS with the following CMake command. The key flags include -DPKG_KOKKOS=ON, -DKokkos_ENABLE_CUDA=ON, -DKokkos_ARCH_VOLTA70=ON, and -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" to match the ABI of my PyTorch/LibTorch installation.

conda activate allegro-new
TORCH_CMAKE=$(python -c 'import torch; print(torch.utils.cmake_prefix_path)') && cmake ../cmake -DCMAKE_PREFIX_PATH="${TORCH_CMAKE}" -DPKG_KOKKOS=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_VOLTA70=ON -DKokkos_ENABLE_SERIAL=ON -DKokkos_ENABLE_OPENMP=ON -DNEQUIP_AOT_COMPILE=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.4/bin/nvcc -DMKL_INCLUDE_DIR=/usr/include/mkl -DMKL_LIBRARIES="-lmkl_intel_lp64 -lmkl_sequential -lmkl_core" -DKokkos_ENABLE_PROFILING=ON -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_EXTENSIONS=OFF -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" -DCMAKE_CUDA_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" -DKokkos_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"
make -j$(nproc)
Full error log:
The complete output leading to the segmentation fault is shown below.
Internal error!
Thu Apr 16 20:39:56 CST 2026
Writing to /root/.config/pip/pip.conf
LAMMPS (11 Feb 2026)
KOKKOS mode with Kokkos version 5.0.2 is enabled
  using double precision
  using view layout = legacy
  will use up to 1 GPU(s) per node
Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
  In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
  For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
  For unit testing set OMP_PROC_BIND=false

  using 1 OpenMP thread(s) per MPI task
package kokkos
package kokkos newton on neigh half
# 平衡脚本,使用新模型(官方仓库原版,仅修正必要项)
variable L index 3              # 3×3×3 → 9×9×9 = 3645 原子(论文图b)
variable STRUCTURE index BaTiO3_init
variable PERIOD index 100000
variable EQUILSTEPS index 100000  # 官方 200 ps = 100000 步 (dt=2fs)
variable RESTART_FREQ index 10000
variable TEMP index 300
variable MODEL index BaTiO3.nequip.pt2  # 你的模型
variable SEED index 12345
variable ITER index 1

# 日志路径
log Logs_${ITER}/log.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_${TEMP}_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_1_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_1_100000_00002fs

units		metal
atom_style	atomic

read_data ${STRUCTURE}.data
read_data BaTiO3_init.data
Reading data file ...
  orthogonal box = (0 0 0) to (11.898254 11.898254 12.912077)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  135 atoms
  read_data CPU = 0.016 seconds
replicate $L $L $L
replicate 3 $L $L
replicate 3 3 $L
replicate 3 3 3
Replication is creating a 3x3x3 = 27 times larger system...
  orthogonal box = (0 0 0) to (35.694762 35.694762 38.736231)
  1 by 1 by 1 MPI processor grid
  3645 atoms
  replicate CPU = 0.006 seconds

# ==============================
# 完全保留你仓库原版!!
# 原子顺序:Ba O Ti 不动!!
# ==============================
pair_style allegro
NequIP/Allegro is using input precision d and output precision d
pair_coeff * * ${MODEL} Ba O Ti
pair_coeff * * BaTiO3.nequip.pt2 Ba O Ti
NequIP/Allegro: Loading model from BaTiO3.nequip.pt2
Type mapping:
NequIP/Allegro type | NequIP/Allegro name | LAMMPS type | LAMMPS name
0 | Ba | 1 | Ba
1 | Ti | 3 | Ti
2 | O | 2 | O
ti=0 tj=0 cut=5.00
ti=0 tj=1 cut=5.00
ti=0 tj=2 cut=5.00
ti=1 tj=0 cut=5.00
ti=1 tj=1 cut=5.00
ti=1 tj=2 cut=5.00
ti=2 tj=0 cut=5.00
ti=2 tj=1 cut=5.00
ti=2 tj=2 cut=5.00

mass 1 137.3
mass 2 15.9994
mass 3 47.9

timestep 0.002

# ==============================
# 完全保留你仓库原版 compute 写法!
# ==============================
compute polarization all allegro polarization 3
compute allegro will evaluate the quantity polarization of length 3
compute polarizability all allegro polarizability 9
compute allegro will evaluate the quantity polarizability of length 9
compute borncharges all allegro/atom born_charge 9 1
compute allegro/atom will evaluate the quantity born_charge of length 9 with newton 1

thermo_style custom pe fmax fnorm spcpu cpuremain

variable efield equal 1e-2*1.5
fix born all addbornforce 0.0 0.0 ${efield}
fix born all addbornforce 0.0 0.0 0.015

restart ${RESTART_FREQ} ./Restarts_${ITER}/restart.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_${ITER}/restart.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_1_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_1_100000.*

thermo 10
velocity all create ${TEMP} ${SEED} dist gaussian rot yes mom yes
velocity all create 300 ${SEED} dist gaussian rot yes mom yes
velocity all create 300 12345 dist gaussian rot yes mom yes
fix nvt all nvt temp ${TEMP} ${TEMP} $(100*dt)
fix nvt all nvt temp 300 ${TEMP} $(100*dt)
fix nvt all nvt temp 300 300 $(100*dt)
fix nvt all nvt temp 300 300 0.2000000000000000111
run ${EQUILSTEPS}
run 100000

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- KOKKOS package: https://doi.org/10.1145/3731599.3767498
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 7
  ghost atom cutoff = 7
  binsize = 7, bins = 6 6 6
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair allegro/kk, perpetual
      attributes: full, newton on, kokkos_device
      pair build: full/bin/kk/device
      stencil: full/bin/3d
      bin: kk/device
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.002
/input_lbg-428977-22468738/lbg-428977-22468738.sh: line 5: 50554 Segmentation fault      ./lmp -sf kk -k on g 1 t 1 -pk kokkos newton on neigh half -in in.equilibrate -echo screen

This pair style is not part of LAMMPS but developed and maintained independently. Thus you are asking in the wrong place. Unless you can reproduce the segfault with a pair style that is part of LAMMPS there is very little that we can do for you. You have to contact the developers of that potential instead and ask them for assistance.

Thank you for your clarification. I understand that the pair_style allegro is not part of the LAMMPS distribution. I will contact the NequIP/Allegro developers for further assistance. Thanks again for your time and help.