Poor GPU utilization on HPC

Hi all,

I configured lammps-5th June 2019 version on my HPC cluster using cmake with OpenMP and nvidia GPU acceleration. When I run my simulations, I see that only 5% of my GPU is being utilized. I feel that this performance is quite poor and I need help to figure what parameters to use according to my GPU hardware to have better performance.

I have cuda 10.0 version and Nvidia GeForce GTX 1080 Ti GPUs. I am using the following build command :

cmake -D BUILD_OMP=yes -D LAMMPS_MACHINE=mpi -D PKG_ASPHERE=yes -D PKG_MANYBODY=on -D PKG_USER-COLVARS=on -D PKG_MOLECULE=on -D PKG_KSPACE=on -D PKG_RIGID=on -D PKG_REPLICA=on -D PKG_GPU=on -D GPU_API=cuda -D GPU_ARCH=sm_61 ../cmake

I have used GPU_ARCH=sm_61 because Nvidia GeForce GTX 1080 Ti GPUs have Pascal microarchitecture and sm_61/60 is recommended in the lammps documentation for Pascal.

I also compiled with sm_30, sm_75 and sm_60 separately but all of these have the same poor performance.

I need your help to figure out how can I increase my performance.

Thanks in advance,
Madhur Aggarwal

The only way we can help improve your performance is if you can give us your input script so that we know what you are doing. I get great performance with a simple Lennard-Jones liquid on a GTX 1080 Ti with Kokkos, but if you are using rigid bodies then you are stuck (right now) because they are not ported to Kokkos (yet).

Hi Stefan,

Thanks for your prompt reply. I have attached my input script in this mail. I am doing Umbrella Sampling by using COLVARS module. I have separate input files for each Umbrella sampling window and I want each window to run on multiple CPUs and atleast 1 GPU.

Also, as per your reply, I was trying to see if I could compile lammps with KOKKOS, but it is giving me a warning followed by an error while compiling it with COLVARS package.

Warning: lammps-5Jun19/lib/colvars/colvarvalue.h(371): warning: statement is unreachable

Error:
lammps-5Jun19/lib/colvars/colvarbias_alb.cpp(391): error: more than one instance of overloaded function "fmax" matches the argument list:
            function "fmax(double, double)"
            function "fmax(float, float)"
            argument types are: (int, int)

Please let me know if you have any idea how to solve this as well.

Thanks a lot,
Madhur Aggarwal

lammps.inp (1.66 KB)

Hi Stefan,

Thanks for your prompt reply. I have attached my input script in this mail. I am doing Umbrella Sampling by using COLVARS module. I have separate input files for each Umbrella sampling window and I want each window to run on multiple CPUs and atleast 1 GPU.

you can’t get much GPU utilization, because the pair style you are using has not been ported to any GPU accelerated package. only pppm will run on the GPU (and there the speedup is small compared to what you would get with a GPU accelerated pair style).

you can see which styles have “accelerated” versions in the overview pages, e.g. for pair styles at: https://lammps.sandia.gov/doc/Commands_pair.html
the letters in parenthesis represent which variants are available. your pair style has none and thus it is not a surprise that you don’t get much GPU utilization.

running on the CPU is probably just as fast if not faster.

Also, as per your reply, I was trying to see if I could compile lammps with KOKKOS, but it is giving me a warning followed by an error while compiling it with COLVARS package.

Warning: lammps-5Jun19/lib/colvars/colvarvalue.h(371): warning: statement is unreachable

Error:
lammps-5Jun19/lib/colvars/colvarbias_alb.cpp(391): error: more than one instance of overloaded function “fmax” matches the argument list:
function “fmax(double, double)”
function “fmax(float, float)”
argument types are: (int, int)

Please let me know if you have any idea how to solve this as well.

have you tried the latest stable LAMMPS release (7 Aug 2019)? it contains some bugfixes for USER-COLVARS.

axel.

Hi Madhur, first, follow Axel’s advice and use the latest stable release whenever possible.

Can you please post the compiler (with version) that you are using?

Also, keep in mind that collective variables functions are not GPU-parallelized. Many of them are highly non-linear functions of the atomic coordinates that need to be computed in double precision, making the effort of GPU porting less rewarding. You may only benefit from accelerating the force field functions, which in your case you can’t either.

Giacomo

Hi all,

Thanks a lot for your valuable inputs. My compiler is g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36).
@Axel I tried compiling lammps-7thAug19 version with KOKKOS but it still gives me the same warning and error as in my previous mail.

The good thing is I was able to compile LAMMPS-7thAug19 version with GPU (double precision, for colvars) successfully. I modified my pair potentials and was able to achieve good performance boosts as well on test scripts.
But when I run equilibration, I get the following error message:

Cuda driver error 700 in call at file '/home/shaunak/lammps_test_7thAug19_sm_60/lammps-stable/lammps-7Aug19/lib/gpu/geryon/nvd_timer.h' in line 76.

The pair-style I am using in my input script (attached in my previous mail) is: lj/charmm/coul/long 10 12
Any help to figure out this issue would be much appreciated.

Thank you for your time.
Madhur Aggarwal

Hi all,

Thanks a lot for your valuable inputs. My compiler is g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36).
@Axel I tried compiling lammps-7thAug19 version with KOKKOS but it still gives me the same warning and error as in my previous mail.

i suspect that this is an issue not of the gcc (it works fine in our CentOS 7 integration and compilation tests), but rather with nvcc. this is straightforward to address and should compile with the following change:

diff --git a/lib/colvars/colvarbias_alb.cpp b/lib/colvars/colvarbias_alb.cpp
index 187ecc363…4f6d0ecd3 100644
— a/lib/colvars/colvarbias_alb.cpp
+++ b/lib/colvars/colvarbias_alb.cpp
@@ -388,7 +388,7 @@ std::ostream & colvarbias_alb::write_traj(std::ostream &os)
for (size_t i = 0; i < means.size(); i++) {
os << " "
<< std::setprecision(cvm::cv_prec) << std::setw(cvm::cv_width)

  • << -2. * (means[i] / (static_castcvm::real (colvar_centers[i])) - 1) * ssd[i] / (fmax(update_calls,2) - 1);
  • << -2.0 * (means[i] / (static_castcvm::real (colvar_centers[i])) - 1.0) * ssd[i] / (fmax((double)update_calls,2.0) - 1.0);

}

The good thing is I was able to compile LAMMPS-7thAug19 version with GPU (double precision, for colvars) successfully. I modified my pair potentials and was able to achieve good performance boosts as well on test scripts.
But when I run equilibration, I get the following error message:

Cuda driver error 700 in call at file ‘/home/shaunak/lammps_test_7thAug19_sm_60/lammps-stable/lammps-7Aug19/lib/gpu/geryon/nvd_timer.h’ in line 76.

The pair-style I am using in my input script (attached in my previous mail) is: lj/charmm/coul/long 10 12
Any help to figure out this issue would be much appreciated.

no idea what that is caused by.

axel.

[…]

Cuda driver error 700 in call at file ‘/home/shaunak/lammps_test_7thAug19_sm_60/lammps-stable/lammps-7Aug19/lib/gpu/geryon/nvd_timer.h’ in line 76.

The pair-style I am using in my input script (attached in my previous mail) is: lj/charmm/coul/long 10 12
Any help to figure out this issue would be much appreciated.

a little googling shows, that other people had this error message with LAMMPS, when they were running a simulation that had too many atoms per GPU, i.e. were exhausting the available GPU memory and thus causing the equivalent of a segmentation fault on the GPU.

axel.