I am trying to use KOKKOS with and without a GPU. I’ve finding not much speedup for my problem with KOKKOS without a GPU but some speedup with a GPU. I am worried that I am not using the GPU I intend. I have a GeForce GT720 on GPU ID 0 and a Tesla K40 on GPU ID 1 (from the output of nvidia-smi). Obviously I intend to use the K40.

How does KOKKOS and LAMMPS know which GPU to use?

Also, I’m getting segmentation faults if I try to use more than 1 MPI task when using a GPU. I’ve compiled MPI and KOKKOS OMP versions of LAMMPS with Intel compilers and I’ve compiled KOKKOS CUDA OMP with GNU compilers (only because I was getting errors when trying to compile with Intel with KOKKOS CUDA).

I have about 95,000 atoms and I have a lot of harmonic bonds, angles, and OPLS torsions, and I am using a lj/cut potential base potential which I think gets turned into a KOKKOS potential. No electrostatics.

Should I expect to get speedup with KOKKOS without a GPU?

I found a few issues that I didn’t find in the manual. I wanted to mention them as a newbie to GPUs for someone else if they come across this.

  1. I was getting errors when trying to start KOKKOS from a restart file. I’m not sure if this is because the restart file was written with a October 20 version of LAMMPS vs a Nov 22 version I compiled with KOKKOS.
  2. I originally was running lj/cut/opt and apparently KOKKOS didn’t do anything with this with the kk flag. I needed to change to lj/cut so that KOKKOS recognized it I guess.

Thanks in advance.


Below is how I’m trying to run:

KOKKOS, OMP (Compiled with Intel Compilers with –qopenmp flag)

mpirun -np 1 lmp_kokkos_omp_intel -k on t 16 -sf kk -in in.lammps

KOKKOS, CUDA, OMP (Compiled with openmpi)

mpirun -np 1 lmp_kokkos_cuda_openmpi -k on t 16 g 1 -sf kk -in in.lammps


I’m not sure how you can choose gpuID with kokkos. According to the documentation of the package command,, you can specify the GPU ID with the GPU package, but no such specification is available for KOKKOS.

About the MPI segfault when used on multiple processes, this is probably because OpenMPI is not compiled with --with-cuda flag, see discussion here:

Also, it is not recommended to use multiple GPU’s if they are not of the same type. I think KOKKOS assumes that each GPU have the same performance so that workload is distributed equally.

For many pair styles (such as Stillinger Weber and Vashishta) will give a nice performance boost even without GPU because the implementation contains several nice tricks.

About questions 1 and 2, other people can give better answers than I can.