I encountered this issue while working with reax potential. File with complete error message is attached, also the scripts causing it. The file names are explicit.
All scripts are working on lmp_serial and lmp_kokkos_mpi_only.
SCRIPT FILE DESCRIPTION
Randomly creating atoms in the simulation box is causing error. ( in.not_working_random_O_test)
Simulating an off-center (in relation with simulation box) silicon cluster is causing same error. (in.not_working_Si_cluster)
Simulating a centered silicon cluster is working. (in.working_Si_cluster)
Simulation with entire box occupied by silicon lattice is working. (script file not attached)
First, a correction: the Si cluster had to be zero-centered in order to
work, NOT simulation-box-centered as stated in my first email.
After applying the patches, simulations with randomly generated atoms
work, and the error generated by in.not_working_Si_cluster has changed
in:
"Cuda const random access View using Cuda texture memory requires Kokkos
to allocate the View's memory[ServerS:08920]". Complete error message is
attached.
Also, an update of conditions triggering the error: if the cluster is
centered in (16 16 17) or less, the simulation works; if the cluster is
centered in (16 17 17) or higher, the error occurs.
mpirun -np 8 ./lammps-16Mar18/src/lmp_kokkos_cuda_mpi -in in.not_working_Si_cluster -k on g 8 -sf kk
or
mpirun ./lammps-16Mar18/src/lmp_kokkos_cuda_mpi -in in.not_working_Si_cluster -k on g 8 -sf kk //this launches 1 process for each core= 20 processes
WORKS when:
mpirun -np 1 ./lammps-16Mar18/src/lmp_kokkos_cuda_mpi -in in.not_working_Si_cluster -k on g 8 -sf kk
or
./lammps-16Mar18/src/lmp_kokkos_cuda_mpi -in in.not_working_Si_cluster -k on g 1 -sf kk
mpirun -np 8 ./lammps-16Mar18/src/lmp_kokkos_cuda_mpi -in
in.not_working_Si_cluster -k on g 8 -sf kk
or
mpirun ./lammps-16Mar18/src/lmp_kokkos_cuda_mpi -in
in.not_working_Si_cluster -k on g 8 -sf kk //this launches 1 process for
each core= 20 processes
Doing this makes no sense. Your input is tiny. Even running with 1 Gpu is slower than running the same on a decent CPU. When using more Mpi tasks, most will get with standard domain decomposition empty volumes.
@45000 atoms (60 angstrom radius) cluster, 1000 steps, mpirun -np 8 => 30 sec total wall time, while mpirun -np 1 => 50 sec total wall time. almost double... see the attached file.