reax kokkos cuda cudaDeviceSynchronize() error( cudaErrorIllegalAddress)

no way 8x the hardware... the bulk of the work is done by the gpus, which are in number of 8 in both cases. it only means that 8 processes/host cores are more effective than one in keeping busy the 8 gpus.

All the best,
Ionut Nicolae.

yes, way!

to the best of my knowledge, KOKKOS will at most use 1 GPU per MPI rank. you may attach multiple MPI ranks to the same GPU, but not the other way around. thus if you run with mpirun -np 1, it doesn’t matter whether you tell LAMMPS, that you have 8 GPUs on your node. it will only use 1 of them. please check with nvidia-smi. here are my command lines and outputs for running with on a 2 GPU node with only 1 MPI rank (running in.lj from the LAMMPS bench folder).

axel.

[[email protected]…7883… bench]$ mpirun --mca btl sm,self -np 1 lmp_kokkos_cuda_mpi -in in.lj -v x 4 -v y 4 -v z 4 -k on g 2 -pk kokkos -sf kk
LAMMPS (2 Aug 2018)
KOKKOS mode is enabled (…/kokkos.cpp:45)
using 2 GPU(s)
Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (134.368 134.368 134.368)
1 by 1 by 1 MPI processor grid
Created 2048000 atoms
Time spent = 12.2366 secs
Neighbor list info …
update every 20 steps, delay 0 steps, check no
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 2.8
ghost atom cutoff = 2.8
binsize = 1.4, bins = 96 96 96
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair lj/cut/kk, perpetual
attributes: full, newton off, kokkos_device
pair build: full/bin/kk/device
stencil: full/bin/3d
bin: kk/device
Setting up Verlet run …
Unit style : lj
Current step : 0
Time step : 0.005
Per MPI rank memory allocation (min/avg/max) = 305.3 | 305.3 | 305.3 Mbytes
Step Temp E_pair E_mol TotEng Press
0 1.44 -6.7733681 0 -4.6133691 -5.0196699
100 0.75921977 -5.7611315 0 -4.6223024 0.19208802
Loop time of 2.78785 on 1 procs for 100 steps with 2048000 atoms

Performance: 15495.807 tau/day, 35.870 timesteps/s
67.3% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total

"Cuda const random access View using Cuda texture memory requires

Kokkos to allocate the View’s memory[ServerS:08920]". Complete

error message is attached.

I’ve seen this error occurring in multiple places in LAMMPS. It happens when a MPI taks has no particles. This is a new error due to updating the Kokkos library. I’ve reported the issue to the Kokkos developers (also cc’d Christian Trott), a fix should be released soon.

With the KOKKOS package you definitely must use at least 1 MPI task per GPU, so if you have 8 GPUs you must use at least 8 MPI tasks. I believe if you only use 1 MPI task then you are also only using 1 GPU even though you requested 8.

Stan

indeed! many thanks, I wasn't aware of that.

@739778 atoms, it scales better: 22" vs. 1'22".

All the best,
Ionut Nicolae.