GPU Package of LAMMPS

sarah · April 7, 2022, 7:18pm

Hello all,
I am doing some scaling tests on the performance of GPU and CPU. I have some questions and I greatly appreciate it if you experts can help me.

Which calculations are more efficient to use GPU package on LAMMPS rather than just running on CPU?
For better performance and finishing simulation quicker, How many GPU nodes and How many CPUs should I utilize?
I compile the most updated LAMMPS version and enabled the GPU package by the " GPU_ARCH=sm_80 ", Can I run the simulation using GPU package on other GPU architecture? like sm_60.
I compile the most updated LAMMPS version and enabled the GPU package by the " GPU_PREC=double ", Can I run the simulation using GPU package on other GPU precision? like single.

akohlmey · April 7, 2022, 7:34pm

You need a certain system size to fully utilize the GPU, otherwise there are not enough “work units”.
The GPU acceleration is most efficient on pair styles, and for that generally more efficient for styles that have more computational complexity and “expensive” math (exponentials, power functions etc.) rather than just multiplication and addition (Lennard-Jones). Thus many-body and aspherical potentials are getting a large boost. Probably one of the most GPU optimized pair styles is the SNAP pair style (but that requires the use of KOKKOS, not the GPU package)

At the other end of the spectrum is the PPPM kspace style, which is only partially GPU accelerated. Many cases, it is better to not use GPU acceleration for it, but instead run it concurrently to the pair style (which runs very well on the GPU).

That is entirely dependent on the system size and the hardware you have available. The GPU package can benefit from “oversubscribing” the GPUs (i.e. using 2 - 4 MPI ranks per GPU) as that will increase utilization and at the same time parallelize the non-GPU accelerated parts. Optimal values depend on the hardware and the simulated system. You can also combine it with OpenMP. For KOKKOS that is the best option.

When compiling via CMake, the GPU package will be compiling so-called “fat” GPU kernels that will be compiled for all architectures supported by your CUDA toolkit. Those should work without recompilation. The same is true when compiling for OpenCL instead of CUDA. In that mode, you can run not only on Nvidia GPUs, but also on AMD (without HIP even though that is supported as well) and Intel GPUs.

For KOKKOS you must compile for the exact target architecture. If you don’t, it will look as if you job is “stuck” because the runtime will try to recompile the kernels, but they will eventually fail.

Not without changing the configuration and recompilation.

Please note that most of what you are asking for is explained in the LAMMPS manual. There is a whole section about acceleration packages. Also, you can get some (old) benchmark data on the LAMMPS homepage.

sarah · April 7, 2022, 7:57pm

Dear Axel,

Many thanks for your time and detailed response. I appreciate your help and being so responsive.