跑lammps时出现错误

cudaEventSynchronize(CudaInternal::constantMemReusable) error( cudaErrorIllegalAddress): an illegal memory access was encountered …/…/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:617

LAMMPS (2 Aug 2023 - Update 1)
KOKKOS mode is enabled (…/kokkos.cpp:107)
will use up to 2 GPU(s) per node
Reading data file …
orthogonal box = (-0.619498 -0.639998 -0.635497) to (70.6205 70.640001 70.6345)
1 by 2 by 1 MPI processor grid
reading atoms …
15145 atoms
read_data CPU = 0.068 seconds

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:

  • pair reaxff command: doi:10.1016/j.parco.2011.08.005
  • fix qeq/reaxff command: doi:10.1016/j.parco.2011.08.005
    The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Switching to ‘neigh_modify every 1 delay 0 check yes’ setting during minimization
Neighbor list info …
update: every = 1 steps, delay = 0 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 12.5
ghost atom cutoff = 12.5
binsize = 12.5, bins = 6 6 6
2 neighbor lists, perpetual/occasional/extra = 2 0 0
(1) pair reax/c/kk, perpetual
attributes: half, newton off, ghost, kokkos_device
pair build: half/bin/newtoff/ghost/kk/device
stencil: full/ghost/bin/3d
bin: kk/device
(2) fix qeq/reax/kk, perpetual
attributes: full, newton off, kokkos_device
pair build: full/bin/kk/device
stencil: full/bin/3d
bin: kk/device
Setting up cg/kk style minimization …
Unit style : real
Current step : 0
WARNING: Fix with atom-based arrays not compatible with sending data in Kokkos communication, switching to classic exchange/border communication (…/comm_kokkos.cpp:666)
WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (…/atom_kokkos.cpp:178)
Per MPI rank memory allocation (min/avg/max) = 176.6 | 183.1 | 189.7 Mbytes
Step Temp Press TotEng Density
0 303.15 -nan -nan 0.6494629
cudaEventSynchronize(CudaInternal::constantMemReusable) error( cudaErrorIllegalAddress): an illegal memory access was encountered …/…/lib/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:617
Backtrace:
[0x2355813]
[0x233b118]
[0x233b14b]
[0x234308d]
[0xb85c1b]
[0xbec722]
[0xbecb74]
[0xbecf34]
[0xc4d69c]
[0x18b0f13]
[0x13ebf6a]
[0x13efec0]
[0xaed0c1]
[0x18b1901]
[0x1bbef65]
[0x13db06e]
[0x13db8db]
[0x48eb98]
__libc_start_main [0x7fb6a0a89555]
[0x4e1ab7]
[g0002:19705] *** Process received signal ***
[g0002:19705] Signal: Aborted (6)
[g0002:19705] Signal code: (-6)
[g0002:19705] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7fb6a1737630]
[g0002:19705] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7fb6a0a9d387]
[g0002:19705] [ 2] /lib64/libc.so.6(abort+0x148)[0x7fb6a0a9ea78]
[g0002:19705] [ 3] lmp_kokkos_cuda_mpi[0x233b150]
[g0002:19705] [ 4] lmp_kokkos_cuda_mpi[0x234308d]
[g0002:19705] [ 5] lmp_kokkos_cuda_mpi[0xb85c1b]
[g0002:19705] [ 6] lmp_kokkos_cuda_mpi[0xbec722]
[g0002:19705] [ 7] lmp_kokkos_cuda_mpi[0xbecb74]
[g0002:19705] [ 8] lmp_kokkos_cuda_mpi[0xbecf34]
[g0002:19705] [ 9] lmp_kokkos_cuda_mpi[0xc4d69c]
[g0002:19705] [10] lmp_kokkos_cuda_mpi[0x18b0f13]
[g0002:19705] [11] lmp_kokkos_cuda_mpi[0x13ebf6a]
[g0002:19705] [12] lmp_kokkos_cuda_mpi[0x13efec0]
[g0002:19705] [13] lmp_kokkos_cuda_mpi[0xaed0c1]
[g0002:19705] [14] lmp_kokkos_cuda_mpi[0x18b1901]
[g0002:19705] [15] lmp_kokkos_cuda_mpi[0x1bbef65]
[g0002:19705] [16] lmp_kokkos_cuda_mpi[0x13db06e]
[g0002:19705] [17] lmp_kokkos_cuda_mpi[0x13db8db]
[g0002:19705] [18] lmp_kokkos_cuda_mpi[0x48eb98]
[g0002:19705] [19] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb6a0a89555]
[g0002:19705] [20] lmp_kokkos_cuda_mpi[0x4e1ab7]
[g0002:19705] *** End of error message ***

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 0 with PID 19705 on node g0002 exited on signal 6 (Aborted).

这是输入文件
variable inname string “in”
variable basename string “pyrolysis”

units real
atom_style charge
read_data ${inname}.data

pair_style reax/c NULL
pair_coeff * * HCONSB.ff C H N O S

fix reax_qeq all qeq/reax 50 0.0 8.0 1e-4 reax/c

neighbor 2.5 bin
neigh_modify every 500 delay 0 check no

thermo 500
thermo_style custom step temp press etotal density
dump traj all custom 500 {basename}.lammpstrj id type x y z log {basename}.log

fix ensemble all nve
fix berendsen all temp/berendsen 303.15 2273.15 10
velocity all create 303.15 114514

minimize 1e-4 1e-6 1000 1000
timestep 0.1
run 1000000

write_data ${basename}.data

Some comments:

  • please post in english
  • please read and follow the forum guidelines
  • when reporting errors, please always check if those can be reproduced with the latest release (there have been a new stable release and several feature releases since)
  • if you want a proper response, please have the courtesy of formulating a proper question. Just copying inputs and outputs and errors is considered rather rude.
5 Likes

Your simulation already blew up on timestep 0 with NaNs (not a number):

Step Temp Press TotEng Density
0 303.15 -nan -nan 0.6494629

So it is likely a problem with your force field or starting configuration. Does the same issue happen with the non-Kokkos (vanilla CPU) version? As Axel said, we need a full input, data files, force field files, etc. if you want us to look at the issue.