KOKKOS error -cudaErrorIllegalAddress)

Hello to LAMMPS users,

I am currently facing an error as shown below while running a friction simulation using the KOKKOS package with a NVIDIA RTX 4090 GPU.

"cudaStreamSynchronize(stream) error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/name/lammps-22Jul2025/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:165
Backtrace:
[0x64ad65f2d389]
[0x64ad65f09bb0]
[0x64ad65f33216]
[0x64ad65f33bb9]
[0x64ad65cf3d91]
[0x64ad65cf4138]
[0x64ad65d065f2]
[0x64ad65d08865]
[0x64ad652d62dd]
[0x64ad644b0e14]
[0x64ad63e9e727]
[0x64ad63d6737b]
[0x64ad63d67d7f]
[0x64ad63ccecb1]
[0x76a65ea2a1ca]
[0x76a65ea2a28b] __libc_start_main
[0x64ad63d5a915]
**[DESKTOP-20TF71N:192096] *** Process received signal *****
[DESKTOP-20TF71N:192096] Signal: Aborted (6)
[DESKTOP-20TF71N:192096] Signal code: (-6)
[DESKTOP-20TF71N:192096] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x76a65ea45330]
[DESKTOP-20TF71N:192096] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x11c)[0x76a65ea9eb2c]
[DESKTOP-20TF71N:192096] [ 2] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x1e)[0x76a65ea4527e]
[DESKTOP-20TF71N:192096] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xdf)[0x76a65ea288ff]
[DESKTOP-20TF71N:192096] [ 4] lmp(+0x2469bbd)[0x64ad65f09bbd]
[DESKTOP-20TF71N:192096] [ 5] lmp(+0x2493216)[0x64ad65f33216]
[DESKTOP-20TF71N:192096] [ 6] lmp(+0x2493bb9)[0x64ad65f33bb9]
[DESKTOP-20TF71N:192096] [ 7] lmp(+0x2253d91)[0x64ad65cf3d91]
[DESKTOP-20TF71N:192096] [ 8] lmp(+0x2254138)[0x64ad65cf4138]
[DESKTOP-20TF71N:192096] [ 9] lmp(+0x22665f2)[0x64ad65d065f2]
[DESKTOP-20TF71N:192096] [10] lmp(+0x2268865)[0x64ad65d08865]
[DESKTOP-20TF71N:192096] [11] lmp(+0x18362dd)[0x64ad652d62dd]
[DESKTOP-20TF71N:192096] [12] lmp(+0xa10e14)[0x64ad644b0e14]
[DESKTOP-20TF71N:192096] [13] lmp(+0x3fe727)[0x64ad63e9e727]
[DESKTOP-20TF71N:192096] [14] lmp(+0x2c737b)[0x64ad63d6737b]
[DESKTOP-20TF71N:192096] [15] lmp(+0x2c7d7f)[0x64ad63d67d7f]
[DESKTOP-20TF71N:192096] [16] lmp(+0x22ecb1)[0x64ad63ccecb1]
[DESKTOP-20TF71N:192096] [17] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x76a65ea2a1ca]
[DESKTOP-20TF71N:192096] [18] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x76a65ea2a28b]
[DESKTOP-20TF71N:192096] [19] lmp(+0x2ba915)[0x64ad63d5a915]
**[DESKTOP-20TF71N:192096] *** End of error message *****
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 192096 on node DESKTOP-20TF71N exited on
signal 6 (Aborted).
--------------------------------------------------------------------------"

The current LAMMPS version I am using is Jul22-2025 version
and running in UBUNTU 24.04
The CUDA version is 12.6
And these are the packages I have installed.
“cmake -C …/cmake/presets/basic.cmake -C …/cmake/presets/kokkos-cuda.cmake …/cmake
cmake -D Kokkos_ENABLE_CUDA=yes -D Kokkos_ENABLE_OPENMP=yes -D PKG_KOKKOS=yes -D PKG_MEAM=on -D PKG_MOLECULE=on -D PKG_OPENMP=yes -D GPU_API=cuda -D GPU_ARCH=sm_89 …/cmake”
My input keywords to start the simulation is (just in case)
“mpirun -np 1 lmp -k on g 1 -sf kk -pk kokkos neigh half newton on -in Simulation.lmp”

I have searched this error from here and tried to cool down my GPU when using, (maintaining at about 37~45 degree celsius) but still this error appears every so often.

I still can’t find out why this happens, so it will be grateful if anyone has suggestions or has seen this before.

As far as I read the discussion, the conclusion was not the heat, but one defective GPU (out of 4).

The error is a very generic error from a low level library, so it is very difficult to give any suggestions without the ability to reproduce the error or knowing any details about your simulation.
Some questions:

  • you say you are using the 22 July 2025 version. Is that the original release or the update?
  • does the same error happen with other input decks, e.g. the LAMMPS bench inputs or some of the examples, or only with this one input?
  • does your simulation run to completion without errors, when you are not using KOKKOS?
  1. It was the original release. Did not check it had an update. Should I try the updated version?

  2. It happens only to these (Indentation/Friction) kind of simulation.
    I have done modeling a DLC using the liquid-quenching method with the same potential files, parameter, and KOKKOS, and it ran without error.

  3. Yes, though the amount of atoms and the size of the simulation was different
    (past : 4,000 atoms, current : 20,000 atoms), it ran fine using the CPU.

You have to check which bugs are fixed. If there is no mention of bugfixes in the KOKKOS package, then the chance is small that it will address your problem.

This only counts, if you run the exact same simulation. The issue could be triggered by your starting configuration.

1 Like

I will try to simulate without using KOKKOS.

Thank you for your time and kind suggestion.

This error is basically the same as a segmentation fault on the CPU, and is typically due to either an out of bounds memory access or trying to access host memory inside a device kernel. I will try to reproduce on H100 when I get a chance.

@stamoor Thank you for your reply!
Just one question, could the neighbor list size be one of the cause of this memory error?

Dr. Akohlmey

  1. I have checked the update but there seems no mention of bugfixed in the KOKKOS package

  2. I have done the exact same simulation without the KOKKOS package and it runs well.

Would you have any further suggestions for troubleshooting steps I could check next?

Thank you for your continued assistance.

There is not enough information here for a more detailed diagnosis and resulting suggestions.

@FTLMD Can you please post a minimal working example of the issue so we can debug? Thank you.

Thank you for your assistance.

Currently what I am trying to do a is a friction(sliding) simulation.
A Si tip sliding on the surface of a Zr doped Carbon substrate.
The atoms used are : C, Zr, Si
I have used the hybrid pair style as follows
C-C, C-Zr, Zr-Zr : MEAM potential
Si-Si : Tersoff potential
C-Si, Zr-Si : LJ potential

The simulation process is as

  1. Relaxation
  2. Indentation of the tip
  3. Relaxation
  4. Sliding
  5. Relaxation
  6. Unloading of the tip

Simulation condition :
[ Modelling :
Zr doped substrate
Fixed layer- fixed with move linear keyword
Thermostat layer- NVT 300K
Newtonian layer - NVE

Si tip :
A hemisphere fixed or moving using the move linear keyword.

The normal load/indentation force I am trying to give is 150 nN (approximately 93.59 eV/A)
Indentation and sliding speed : 0.1 A/ps
timestep : 0.25 fs
]

I am trying to speed up the simulation using the KOKKOS package(compiled to a Geforce RTX4090 GPU).
What I first encountered is that the above
Cuda: Illegal memory access pops up when the simulation is going through the indentation/sliding (random but mostly at the indentation step) step.

I have ran the same simulation with CPU and it runs fine without an error.

In addition I have tried

  1. increasing the neighbor list,
  2. slower indentation speed(0.05 A/ps)
  3. reduced the timestep(0.1 fs)
    But all have them shows the same error at the indentation/sliding step when using the KOKKOS package.

If there are any other steps I should take or any information you require, please let me know and I’ll respond as soon as possible.
Thank you for your consideration.

We need a LAMMPS input file and data file–everything to run LAMMPS, not just a text description.

I’m terribly sorry, I misunderstood what you meant.

Here are the files,

Data file
Assembly.data (2.2 MB)

Input script
Simulation.lmp (6.0 KB)

Potential files
2005_SiC.tersoff (1.8 KB)
ZrC.meam (703 Bytes)
ZrC_library.meam (519 Bytes)

Thank you Dr.Stamoor

1 Like