CUDA illegal memory access

Hello LAMMPS developers,

I am encountering a CUDA illegal memory access when running a DEM simulation using Kokkos (CUDA, no UVM) with gran/hooke/history/kk and NEIGH_HISTORY/KK/DEVICE

Error:

^Ck0006390@nal-005:/mnt/gs21/scratch/groups/wmich/lammpsrun/run$ mpirun -np 4 --bind-to core --map-by ppr:4:node:pe=1 -x UCX_IB_GPU_DIRECT_RDMA=yes -x UCX_WARN_UNUSED_ENV_VARS=n ~/lammps/builda100cudaawarenouvm/lmp -k on g 4 -sf kk -pk kokkos neigh half comm device -in in.settle_restart
LAMMPS (10 Dec 2025 - Development - patch_10Dec2025-120-ga51f9ba0e7)
KOKKOS mode with Kokkos version 5.0.0 is enabled
using double precision
using view layout = legacy
will use up to 4 GPU(s) per node
using 1 OpenMP thread(s) per MPI task
Reading restart file …
restart file = 10 Dec 2025, LAMMPS = 10 Dec 2025
restoring atom style sphere/kk from restart
orthogonal box = (-0.12 -0.4 -0.4) to (0.16 0.4 0.4)
1 by 2 by 2 MPI processor grid
restoring pair style gran/hooke/history/kk from restart
7223825 atoms
read_restart CPU = 5.820 seconds
Resetting peratom fix info from restart file:
fix style: wall/gran/kk, fix ID: wall_floor

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Resetting global fix info from restart file:

fix style: NEIGH_HISTORY/KK/DEVICE, fix ID: NEIGH_HISTORY_HH0
Resetting peratom fix info from restart file:
fix style: NEIGH_HISTORY/KK/DEVICE, fix ID: NEIGH_HISTORY_HH0
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
All restart file global fix info was re-assigned
All restart file peratom fix info was re-assigned
Neighbor list info …
update: every = 1 steps, delay = 0 steps, check = yes
max neighbors/atom: 10000, page size: 100000
master list distance cutoff = 0.0025
ghost atom cutoff = 0.0025
binsize = 0.0025, bins = 113 320 320
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair gran/hooke/history/kk, perpetual
attributes: half, newton off, size, history, kokkos_device
pair build: half/bin/newtoff/size/kk/device
stencil: full/bin/3d
bin: kk/device
Setting up Verlet run …
Unit style : si
Current step : 0
Time step : 1e-08
Per MPI rank memory allocation (min/avg/max) = 3817 | 3884 | 3977 Mbytes
Step Atoms Dt Time c_fmax[1] c_fmax[2] c_fmax[3] CPU
0 7223825 1e-08 0 1.7014669e-06 0 0 0
20000 7223825 1e-08 0.0002 2.9694914e-06 1.8284306e-05 1.9164159e-05 113.87738
40000 7223825 1e-08 0.0004 2.9693954e-06 1.8282673e-05 1.9162414e-05 227.46573
60000 7223825 1e-08 0.0006 2.9692994e-06 1.828104e-05 1.916067e-05 360.13538
80000 7223825 1e-08 0.0008 2.9692034e-06 1.8279408e-05 1.9158926e-05 473.66435
100000 7223825 1e-08 0.001 2.9691074e-06 1.8277776e-05 1.9157182e-05 587.15008
120000 7223825 1e-08 0.0012 2.9690115e-06 1.8276144e-05 1.9155438e-05 718.71947
140000 7223825 1e-08 0.0014 2.9689155e-06 1.8274513e-05 1.9153695e-05 831.7781
160000 7223825 1e-08 0.0016 2.9688195e-06 1.8272881e-05 1.9151952e-05 963.32055
180000 7223825 1e-08 0.0018 2.9687235e-06 1.827125e-05 1.9150209e-05 1076.073
200000 7223825 1e-08 0.002 2.9686276e-06 1.8269619e-05 1.9148466e-05 1188.9594
220000 7223825 1e-08 0.0022 2.9685316e-06 1.8267989e-05 1.9146724e-05 1324.0336
240000 7223825 1e-08 0.0024 2.9684356e-06 1.8266358e-05 1.9144981e-05 1436.8825
260000 7223825 1e-08 0.0026 0.0070449508 0.01971426 0.0078929965 1573.9749
280000 7223825 1e-08 0.0028 2.9682437e-06 1.8274019e-05 1.914909e-05 1686.9452
300000 7223825 1e-08 0.003 0.0053569413 0.0067519438 0.010838032 1799.7158
320000 7223825 1e-08 0.0032 0.0020133844 0.0087335999 0.020193034 1932.5763
340000 7223825 1e-08 0.0034 0.0032335168 0.0027114475 0.01556825 2045.8299
cudaStreamSynchronize(stream) error( cudaErrorIllegalAddress): an illegal memory access was encountered /mnt/home/k0006390/lammps/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:157
Backtrace:
[0x178ae65]
[0x176491c]
[0x1791963]
[0x1791b37]
[0x876a15]
[0xc7756d]
[0x63c16c]
[0x4ef845]
[0x4eff4e]
[0x48252d]
[0x147eb6ad9d90]
[0x147eb6ad9e40] __libc_start_main
[0x4e40a5]
[nal-005:1782865] *** Process received signal ***
[nal-005:1782865] Signal: Aborted (6)
[nal-005:1782865] Signal code: (-6)
[nal-005:1782865] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x147eb6af2520]
[nal-005:1782865] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x147eb6b469fc]
[nal-005:1782865] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x147eb6af2476]
[nal-005:1782865] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x147eb6ad87f3]
[nal-005:1782865] [ 4] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x1764929]
[nal-005:1782865] [ 5] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x1791963]
[nal-005:1782865] [ 6] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x1791b37]
[nal-005:1782865] [ 7] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x876a15]
[nal-005:1782865] [ 8] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0xc7756d]
[nal-005:1782865] [ 9] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x63c16c]
[nal-005:1782865] [10] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x4ef845]
[nal-005:1782865] [11] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x4eff4e]
[nal-005:1782865] [12] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x48252d]
[nal-005:1782865] [13] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x147eb6ad9d90]
[nal-005:1782865] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x147eb6ad9e40]
[nal-005:1782865] [15] /mnt/home/k0006390/lammps/builda100cudaawarenouvm/lmp[0x4e40a5]
[nal-005:1782865] *** End of error message ***

You may want to research what has been discussed on the same subject before and evaluate the advice given and report back with that additional information. Here is one such example that I found: Cuda illegal memory access(kokkos) when using multiple GPUs

1 Like

@Madan_B_K if you attach a minimal working example, I will try to debug.

Hello Stan,

I am developing a one-way coupling between SPARTA and LAMMPS (DEM). The attached code runs successfully when the bottom wall is removed, but wgen wall interactions are enabled, I encounter an “illegal CUDA memory access” error.

I need to check with my advisor before sharing the coupling algorithm itself. In the meantime, could you please take a look and let me know whether I am applying the wall boundary condition correctly espicaaly on restart? I am having problem after couple hundred thousand iteraction when particle are interacting with bottom wall.

---------------------------------------------------------

SCRIPT 1: SETTLING PHASE (ULTRA-DENSE RANDOM FILL)

---------------------------------------------------------

units si
dimension 3
atom_style sphere
boundary f f f
newton off

---- 1. KOKKOS SETUP ----

package kokkos neigh half comm device
comm_modify vel yes

---- 2. DOMAIN ----

variable xlo equal -0.138
variable xhi equal 0.160
variable ylo equal -0.40
variable yhi equal 0.40
variable zlo equal -0.40
variable zhi equal 0.40
region domain block {xlo} {xhi} {ylo} {yhi} {zlo} {zhi}
create_box 1 domain

---- 3. PARTICLES & DENSE FILLING ----

variable dp equal 0.0005
variable rho_p equal 2650
variable bed_start equal 0.138
variable thickness equal 0.020
variable bed_end equal {bed_start}+{thickness}

region bed block {bed_start} {bed_end} -0.40 0.40 -0.40 0.40

We request 8000 atoms to “flood” the region

create_atoms 1 random 5000000 482748 bed

set type 1 diameter {dp} set type 1 density {rho_p}

---- 4. INTERACTION SETUP ----

pair_style gran/hooke/history/kk 100000.0 NULL 50.0 NULL 0.5 1
pair_coeff * *

---- 5. CLOSEST POSSIBLE OVERLAP CHECK ----

Deleting at 0.999*dp allows particles to be nearly touching

variable overlap_dist equal {dp}*1 delete_atoms overlap {overlap_dist} all all

---- 6. INTEGRATION & BOUNDARIES ----

Start with a very small timestep for high-density stability

timestep 1.0e-7

fix int all nve/sphere/kk
fix g all gravity/kk 100 vector 1 0 0
fix wall_floor all wall/gran/kk hooke/history 100000.0 NULL 50.0 NULL 0.5 1 xplane NULL ${xhi}

High damping to absorb energy from the dense initial state

fix damp all viscous 0.1

---- 7. SETTINGS ----

neighbor 0.002 bin
neigh_modify delay 0 every 1 check yes one 1000

compute fmax all reduce max fx fy fz
thermo 5000
thermo_style custom step atoms dt time c_fmax[*] cpu
thermo_modify lost ignore flush yes
dump 1 all custom 5000 dump/particles_settle.dump id type x y z vx vy vz fx fy fz diameter

---- 8. RUN SETTLING (THE QUENCH LOOP) ----

print “PHASE 1: High-Density Settling (Velocity Quench Active)”

variable i loop 40
label loop_settle

run             1000

# Reset velocities to stop particles from bouncing away from each other
velocity        all set 0.0 0.0 0.0

# Gradually increase timestep as the system relaxes
if "${i} > 10" then "timestep 1.0e-6"
if "${i} > 25" then "timestep 1e-7"

next i
jump SELF loop_settle

---- 9. FINAL STABILIZATION ----

timestep 2.0e-7
run 10000

---- 10. SAVE RESTART ----

print “Writing Restart File…”

write_restart restart/restart.settled

---------------------------------------------------------

SCRIPT 2: RESTART & DRAG (Max Performance)

---------------------------------------------------------

units si
dimension 3
atom_style sphere
boundary f f f
newton off

package kokkos neigh half comm device
comm_modify vel yes

---- 1. READ RESTART ----

When starting fresh:

read_restart restart/restart.settled
reset_timestep 0

IF RESTARTING LATER, change the line above to:

#read_restart restart/backup.b

---- 3. REDEFINE INTERACTIONS ----

variable xhi equal 0.160
pair_style gran/hooke/history/kk 100000.0 NULL 50.0 NULL 0.5 1
pair_coeff * *
#fix wall_floor all wall/gran hooke/history 100000.0 NULL 50.0 NULL 0.5 1 xplane NULL ${xhi}

#fix 3 all wall/gran granular hertz/material 1e5 1e3 0.3 tangential mindlin NULL 1.0 0.5 xplane NULL ${xhi}

---- 4. INTEGRATION ----

timestep 1.0e-8
fix int all nve/sphere/kk
fix g all gravity/kk 9.81 vector 1 0 0

---- 6. SETTINGS ----

neighbor 0.002 bin
neigh_modify delay 0 every 1 check yes one 1000
compute fmax all reduce max fx fy fz
thermo 2000
thermo_style custom step atoms dt time c_fmax[*] cpu
thermo_modify lost ignore flush yes

---- 5. CFD DRAG FORCE ----

fix cfd_force all cfd/drag/pointcloud/kk fluent_lammps_true3d.txt

---- 7. RESTART CONFIGURATION (TOGGLE MODE) ----

Writes to ‘backup.a’ at 20k, ‘backup.b’ at 40k, ‘backup.a’ at 60k…

restart 5000 restart/backup.a restart/backup.b

dump vtk2 all vtk 50000 dump/particles_force*.vtp id type vx vy vz fx fy fz diameter
dump 1 all custom 5000 dump/particles_force.dump id type x y z vx vy vz fx fy fz diameter

---- 8. RUN ----

run 2000000