GPU package accelerates what, exactly?

rarensu · April 1, 2024, 8:35pm

Howdy,

I am using the GPU package with Intel PVC GPUs. I want to be able to determine which tasks are being performed on the GPU and which aren’t.

I have an input file with the pair style lj/cut and I have the -sf gpu command line option. So, I’m pretty sure at least that pair style is being accelerated, because it’s on the list.

I don’t see any evidence in the LAMMPS log file of when the suffix is being applied. I do see evidence that both the CPUs and the GPUs are being utilized during the run, and the performance increases when I add more of either kind of resource.

But how do I know? Is there like, a verbose flag I can turn on to learn more these things? I really hope this doesn’t end up where I have to read the code and figure it out for myself.

LAMMPS is on the develop branch from late February and I build it using the Makefiles.

akohlmey · April 1, 2024, 8:47pm

You don’t have to read the source code, but a look into the documentation would be helpful.

Please see 7. Accelerate performance — LAMMPS documentation and 5.8. Pair_style potentials — LAMMPS documentation

rarensu · April 2, 2024, 3:25pm

I found these statements.

https://docs.lammps.org/Speed_compare.html

The GPU package moves per-atom data (coordinates, forces, and (optionally) neighbor list data, if not computed on the GPU) between the CPU and GPU at every timestep.
Both KOKKOS and GPU package compute bonded interactions (bonds, angles, etc) on the CPU.
The GPU package requires neighbor lists to be built on the CPU when using exclusion lists, or a triclinic simulation box.

https://www.lammps.org/bench.html

The curve for the GPU package shows this performance hit may be possible to partially overcome, since the GPU package also moves data back and forth every step to perform time integration on the CPU.

These statements are telling me that the data gets moved to the CPU on every time step. But, they don’t tell me what work the CPU is actually doing.

akohlmey · April 2, 2024, 3:48pm

The crucial information is in the second link I provided. It shows which pair styles are accelerated by which package or not. Similarly for command, bond/angle/dihedral/improper, fix and compute styles.

By default everything is done on the CPU unless it is shown to be GPU accelerated. For the GPU package, there are only pair styles and parts of PPPM, the rest runs concurrently to the GPU on the CPU. You will see messages printed for each kernel that is downloaded to the GPU.
For the KOKKOS package, data is transferred as needed. The best performance is achieved if all used styles are accelerated and the data can stay on the GPU. It will print warnings if this is not the case.

If you want more specific details you need to first explain what you need that information for.

rarensu · April 2, 2024, 6:42pm

Thank you for your patience with my questions.

I was able to find these four acceleration messages in my LAMMPS output.

- Using acceleration for lj/cut # (for the LJ benchmark)
- Using acceleration for eam # (for the EAM benchmark)
- Using acceleration for lj/charmm/coul/long # (for the Rhodopsin benchmark)
- Using acceleration for pppm # (for the Rhodopsin benchmark)

all with identical acceleration facts.

-  with 28 proc(s) per device.
-  with 1 thread(s) per proc.
-  with OpenCL Parameters for: INTEL_GPU (500)
-  Horizontal vector operations: ENABLED
-  Shared memory system: No

So now I know what is being calculated on the GPU, that’s progress.

I still don’t understand what the CPU are doing. Are they doing the atom modify? Are they doing neighbor list modify?

I was able to find this message in the LAMMPS output.

Neigh mode:      Hybrid (binning on host) with subgroup support

Does that imply that the CPU are involved in maintaining the neighbor list?

stamoor · April 2, 2024, 9:06pm

Both KOKKOS and GPU package compute bonded interactions (bonds, angles, etc) on the CPU.

Actually you can compute bonded interactions on the GPU with the KOKKOS package now, assuming the styles are ported to Kokkos.

stamoor · April 2, 2024, 9:10pm

With the GPU package, the CPU does everything else. For example, fix nve, fix npt, or any other time integrator, fix shake, computing FFTs for PPPM, etc.

rarensu · April 3, 2024, 3:52am

this page https://docs.lammps.org/Commands_fix.html
seems to indicate that fix nve and fix npt are possible to run on the GPU in the GPU package.
I assume the suffix command would catch those automatically, but I can’t tell if it actually happened or not.

akohlmey · April 3, 2024, 3:56am

They don’t. They use the /gpu suffix but only use multi-threading.