[lammps-users] Busywaiting issue using GPUs on EPYC platform

msjacobs2727 · March 29, 2021, 9:32pm

Hi all,

I’m using lammps20201029 on a platform with 2x EPYC 7702 CPUs (64 cores each), and 4x Nvidia RTX 6000 GPUs; OpenMPI 4.0.5; and testing with the LJ melt input script.

If I don’t use the GPU package, the script runs fine. If I use the GPU package with 63 or fewer cores, it runs fine.

However, if I use the GPU package with 64 or more cores, I get the first two lines of output:

LAMMPS (29 Oct 2020)

using 1 OpenMP thread(s) per MPI task

And then nothing. The job doesn’t end, and the CPU cores are fully loaded according to top. Looking at nvidia-smi, I see the memory for the processes are loaded onto the GPUs, but they have no utilization.

Do you have any ideas?

Michael Jacobs

Dobrynin Group

Department of Chemistry

UNC Chapel Hill

akohlmey · March 29, 2021, 10:48pm

are you using the CUDA multi-process service (CUDA_MPS)? and have you compiled LAMMPS accordingly??
otherwise oversubscribing your GPUs by a factor of 16 can unlikely be efficient. the optimum is usually in the 4x-8x range.
you may be hitting some internal limit of the CUDA driver or other internal features.
if you want to squeeze out more performance from the CPU parts of the code, you could use OpenMP multi-threading from the USER-OMP and/or USER-INTEL package.

axel.