GPU Package Issues

Hi all.

I’m trying to accelerate my simulations using the GPU package, and while LAMMPS can identify the GPU, it’s apparently not using it according to a system admin). I’m using a single GPU and 24 processors on a computing node.

I’m trying to check if I’m using any GPU capable commands or potentials, and I suppose I’m not (harmonic angle and bond potentials and the pair style ‘lj/cut/soft’ have no gpu extensions surprisingly).

But in using pair potentials and ‘fix bond/react’, I should be building plenty of neighbor lists. Do I need to explicitly tell LAMMPS to use the GPU for neighbor lists, e.g. something like ‘package gpu 2 split -1 neigh yes’?

I’m just trying to accelerate these simulations. We’re trying to collect a lot of data exploring parameter space, and a simulation of ~40,000 atoms is taking anywhere from 12 hours to 2 days, and I can’t figure why. Getting the GPU functionality under control would help. Reading the doc page was not very enlightening, so tips and specific resources are appreciated.

Best,

Brian,

you need to read the manual more carefully. there are a detailed discussions of how the GPU package and acceleration works in general and what kind of styles are accelerated.

it is NOT the entire LAMMPS code that is run on the GPU, but rather (some) pair potentials, (most of the time) neighbor lists, and parts of pppm. the core “driver” part of LAMMPS is always run on the CPU, since LAMMPS is an MPI parallel code and thus the MPI communication and all base functionality is run on the CPU (there are a few GPU to GPU communication optimizations, but those are for special cases and still need to be driven from the CPU).
which pair styles are accelerated in the GPU package can be easily seen from the pair style overview page by having a “g” character added in parenthesis:
https://lammps.sandia.gov/doc/Commands_pair.html

the GPU package deliberately does not accelerate bonded interactions or fixes or anything like that, because those have a much smaller potential to be accelerated by GPUs and would require a different data model. for certain system sizes, the is very efficient and it particularly effective on GPU nodes with a significant number of CPU cores, as the GPU part and the CPU parts of the force computation can be executed concurrently and benefit from MPI parallelization (you may oversubscribe the GPU and that may lead to better GPU utilization, too). the KOKKOS package implements a different strategy, where the data is kept on the GPU as much as possible. this requires more porting effort and is faster for simple calculation, where no data needs to be moved between GPU and CPU, but slower for fixes and computes and styles that are not ported as more data needed to be sent back and forth.

whether you are actually using a GPU accelerated style (and which one), should be reported in the output.

there is only a subset of styles ported to the GPU package (or KOKKOS for that matter), because it takes time and effort and most people that contribute code to LAMMPS do not also provide accelerated versions (or only for a subset of the accelerator packages).

fix bond react will request a neighbor list of its own. if running the pair style on the GPU, that usually will also run the neighbor list build for the pair style on the GPU. since fix bond/react will request its own neighbor list, it will then also trigger a neighborlist build on the CPU in that case. you may request for the GPU package to have only the neighbor list build on the CPU and then just transfer it to the GPU. whether that is faster than building the neighbor list both on the GPU and CPU is impossible to predict.

the KOKKOS package implements a different strategy, where the data is kept on the GPU as much as possible. this requires more porting effort and is faster for simple calculation, where no data needs to be moved between GPU and CPU, but slower for fixes and computes and styles that are not ported as more data needed to be sent back and forth.

This isn’t necessarily true. The Kokkos package can run any fix or compute style, as well as packing comm buffers, on the CPU, just like the GPU package, to avoid data transfer. At least in the tests that I’ve done with fix ave and comm on the CPU, the performance is very similar. There is no fundamental reason why the GPU package should be faster.

Stan

Typo, should be: “fix nve and comm on the CPU​…”