I am trying to use KOKKOS with and without a GPU. I’ve finding not much speedup for my problem with KOKKOS without a GPU but some speedup with a GPU. I am worried that I am not using the GPU I intend. I have a GeForce GT720 on GPU ID 0 and a Tesla K40 on GPU ID 1 (from the output of nvidia-smi). Obviously I intend to use the K40.
How does KOKKOS and LAMMPS know which GPU to use?
Also, I’m getting segmentation faults if I try to use more than 1 MPI task when using a GPU. I’ve compiled MPI and KOKKOS OMP versions of LAMMPS with Intel compilers and I’ve compiled KOKKOS CUDA OMP with GNU compilers (only because I was getting errors when trying to compile with Intel with KOKKOS CUDA).
I have about 95,000 atoms and I have a lot of harmonic bonds, angles, and OPLS torsions, and I am using a lj/cut potential base potential which I think gets turned into a KOKKOS potential. No electrostatics.
Should I expect to get speedup with KOKKOS without a GPU?
I found a few issues that I didn’t find in the manual. I wanted to mention them as a newbie to GPUs for someone else if they come across this.
- I was getting errors when trying to start KOKKOS from a restart file. I’m not sure if this is because the restart file was written with a October 20 version of LAMMPS vs a Nov 22 version I compiled with KOKKOS.
- I originally was running lj/cut/opt and apparently KOKKOS didn’t do anything with this with the kk flag. I needed to change to lj/cut so that KOKKOS recognized it I guess.
Thanks in advance.
Below is how I’m trying to run:
KOKKOS, OMP (Compiled with Intel Compilers with –qopenmp flag)
mpirun -np 1 lmp_kokkos_omp_intel -k on t 16 -sf kk -in in.lammps
KOKKOS, CUDA, OMP (Compiled with openmpi)
mpirun -np 1 lmp_kokkos_cuda_openmpi -k on t 16 g 1 -sf kk -in in.lammps