I am encountering a persistent segmentation fault when trying to run an MD simulation using LAMMPS with the NequIP potential and the KOKKOS package for GPU acceleration.
Problem Description:
The simulation runs without any issues on CPUs (without the KOKKOS package). However, when I enable KOKKOS to use a single NVIDIA GPU, it crashes immediately at the first run command with a Segmentation fault (core dumped).
System & Software Environment:
- OS: [Your OS, e.g., Ubuntu 22.04]
- Compiler: GCC [Your version, e.g., 11.4]
- CUDA: 12.4
- MPI: [Your MPI, e.g., OpenMPI 4.1.5]
- GPU: [Your GPU model, e.g., NVIDIA A100 32GB]
- LAMMPS Version: 11 Feb 2026
- NequIP Plugin: [Version/Source, e.g., built from the official GitHub repository]
Steps to Reproduce:
I have attached a minimal input script (in.equilibrate) that replicates the issue. The key steps are:
- Read initial data file for
BaTiO3. - Replicate the system to 3645 atoms.
- Set
pair_style allegro. - Run NVT equilibration.
My Build Configuration:
I built LAMMPS with the following CMake command. The key flags include -DPKG_KOKKOS=ON, -DKokkos_ENABLE_CUDA=ON, -DKokkos_ARCH_VOLTA70=ON, and -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" to match the ABI of my PyTorch/LibTorch installation.
conda activate allegro-new
TORCH_CMAKE=$(python -c 'import torch; print(torch.utils.cmake_prefix_path)') && cmake ../cmake -DCMAKE_PREFIX_PATH="${TORCH_CMAKE}" -DPKG_KOKKOS=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_VOLTA70=ON -DKokkos_ENABLE_SERIAL=ON -DKokkos_ENABLE_OPENMP=ON -DNEQUIP_AOT_COMPILE=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.4/bin/nvcc -DMKL_INCLUDE_DIR=/usr/include/mkl -DMKL_LIBRARIES="-lmkl_intel_lp64 -lmkl_sequential -lmkl_core" -DKokkos_ENABLE_PROFILING=ON -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_EXTENSIONS=OFF -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" -DCMAKE_CUDA_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" -DKokkos_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"
make -j$(nproc)
Full error log:
The complete output leading to the segmentation fault is shown below.
Internal error!
Thu Apr 16 20:39:56 CST 2026
Writing to /root/.config/pip/pip.conf
LAMMPS (11 Feb 2026)
KOKKOS mode with Kokkos version 5.0.2 is enabled
using double precision
using view layout = legacy
will use up to 1 GPU(s) per node
Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
For unit testing set OMP_PROC_BIND=false
using 1 OpenMP thread(s) per MPI task
package kokkos
package kokkos newton on neigh half
# 平衡脚本,使用新模型(官方仓库原版,仅修正必要项)
variable L index 3 # 3×3×3 → 9×9×9 = 3645 原子(论文图b)
variable STRUCTURE index BaTiO3_init
variable PERIOD index 100000
variable EQUILSTEPS index 100000 # 官方 200 ps = 100000 步 (dt=2fs)
variable RESTART_FREQ index 10000
variable TEMP index 300
variable MODEL index BaTiO3.nequip.pt2 # 你的模型
variable SEED index 12345
variable ITER index 1
# 日志路径
log Logs_${ITER}/log.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_${TEMP}_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_$L_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_${ITER}_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_1_${PERIOD}_00002fs
log Logs_1/log.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_1_100000_00002fs
units metal
atom_style atomic
read_data ${STRUCTURE}.data
read_data BaTiO3_init.data
Reading data file ...
orthogonal box = (0 0 0) to (11.898254 11.898254 12.912077)
1 by 1 by 1 MPI processor grid
reading atoms ...
135 atoms
read_data CPU = 0.016 seconds
replicate $L $L $L
replicate 3 $L $L
replicate 3 3 $L
replicate 3 3 3
Replication is creating a 3x3x3 = 27 times larger system...
orthogonal box = (0 0 0) to (35.694762 35.694762 38.736231)
1 by 1 by 1 MPI processor grid
3645 atoms
replicate CPU = 0.006 seconds
# ==============================
# 完全保留你仓库原版!!
# 原子顺序:Ba O Ti 不动!!
# ==============================
pair_style allegro
NequIP/Allegro is using input precision d and output precision d
pair_coeff * * ${MODEL} Ba O Ti
pair_coeff * * BaTiO3.nequip.pt2 Ba O Ti
NequIP/Allegro: Loading model from BaTiO3.nequip.pt2
Type mapping:
NequIP/Allegro type | NequIP/Allegro name | LAMMPS type | LAMMPS name
0 | Ba | 1 | Ba
1 | Ti | 3 | Ti
2 | O | 2 | O
ti=0 tj=0 cut=5.00
ti=0 tj=1 cut=5.00
ti=0 tj=2 cut=5.00
ti=1 tj=0 cut=5.00
ti=1 tj=1 cut=5.00
ti=1 tj=2 cut=5.00
ti=2 tj=0 cut=5.00
ti=2 tj=1 cut=5.00
ti=2 tj=2 cut=5.00
mass 1 137.3
mass 2 15.9994
mass 3 47.9
timestep 0.002
# ==============================
# 完全保留你仓库原版 compute 写法!
# ==============================
compute polarization all allegro polarization 3
compute allegro will evaluate the quantity polarization of length 3
compute polarizability all allegro polarizability 9
compute allegro will evaluate the quantity polarizability of length 9
compute borncharges all allegro/atom born_charge 9 1
compute allegro/atom will evaluate the quantity born_charge of length 9 with newton 1
thermo_style custom pe fmax fnorm spcpu cpuremain
variable efield equal 1e-2*1.5
fix born all addbornforce 0.0 0.0 ${efield}
fix born all addbornforce 0.0 0.0 0.015
restart ${RESTART_FREQ} ./Restarts_${ITER}/restart.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_${ITER}/restart.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_${MODEL}_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_${STRUCTURE}_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_${TEMP}_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_$L_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_${ITER}_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_1_${PERIOD}.*
restart 10000 ./Restarts_1/restart.equilibrate_BaTiO3.nequip.pt2_BaTiO3_init_300_3_1_100000.*
thermo 10
velocity all create ${TEMP} ${SEED} dist gaussian rot yes mom yes
velocity all create 300 ${SEED} dist gaussian rot yes mom yes
velocity all create 300 12345 dist gaussian rot yes mom yes
fix nvt all nvt temp ${TEMP} ${TEMP} $(100*dt)
fix nvt all nvt temp 300 ${TEMP} $(100*dt)
fix nvt all nvt temp 300 300 $(100*dt)
fix nvt all nvt temp 300 300 0.2000000000000000111
run ${EQUILSTEPS}
run 100000
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
- KOKKOS package: https://doi.org/10.1145/3731599.3767498
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Neighbor list info ...
update: every = 1 steps, delay = 0 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 7
ghost atom cutoff = 7
binsize = 7, bins = 6 6 6
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair allegro/kk, perpetual
attributes: full, newton on, kokkos_device
pair build: full/bin/kk/device
stencil: full/bin/3d
bin: kk/device
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.002
/input_lbg-428977-22468738/lbg-428977-22468738.sh: line 5: 50554 Segmentation fault ./lmp -sf kk -k on g 1 t 1 -pk kokkos newton on neigh half -in in.equilibrate -echo screen