GPU Support for fix temp/rescale?

Gabriele · September 26, 2023, 9:29pm

Hello,
One of my users employs the fix temp rescale algorithm. I know that GPU support has been added for fix nve and fix nvt. I was wondering if GPU support for fix temp/rescale is planned or even under development at this point. Thanks and greetings,
Gabriele

akohlmey · September 26, 2023, 9:40pm

The GPU package does not accelerate fixes at all, so there is no harm done.

This is different for the KOKKOS package, which is not only for GPU support, but also supports multi-threading.

There are currently no plans to port fix temp/rescale to KOKKOS. Technically, it should not be a big problem. But then again, scientifically speaking, using fix temp/rescale in any kind of production simulation is a very, very bad idea. This thermostat algorithm has many flaws.

If you user doesn’t want to forego using it, there is always the option to use the GPU package instead of KOKKOS. When accuracy doesn’t matter so much (or else fix temp/rescale would not be justified), then there is a lot of additional performance potential in using the GPU package, since it can be compiled for mixed or even single precision and achieve higher acceleration compared to KOKKOS which requires full double precision.

Gabriele · September 26, 2023, 10:00pm

This is great information! Thank you Axel. When I was talking about GPU support, I implicitly meant GPU support via Kokkos. I will propose to the user to instead use the libgpu.a support, with the caveats you pointed out.
I guess there is a special run time parameter to lammps to specify use of libgpu? I guess I did not quite understand why you say that the GPU package does not accelerate fixes at all, but you would expect performance improvement for fix temp/rescale?
Thanks again, Gabriele

akohlmey · September 26, 2023, 10:14pm

The KOKKOS and the GPU package have different approaches to GPU acceleration: the GPU package follows the older “GPU as accelerator” approach which has ported only calculations that benefit significantly from GPU acceleration (primarily neighbor list builds and pair styles), everything else is run on the CPU, a chunk of it even concurrently to the GPU. There also is acceleration for parts of PPPM, but that is rarely helpful outside of serial execution. This only moves position data into the GPU and force data out; the KOKKOS package on the other hand follows the “CPU as a decelerator” approach and keeps all data on the GPU as much as possible. This is why having non accelerated fixes cause a performance penalty for KOKKOS, but not for GPU. The GPU package can further benefit from oversubscribing GPUs (especially modern, powerful ones, particularly when run with the CUDA MPS daemon) because it can utilize the more efficient MPI parallelization on the non GPU accelerated parts. Since the Pair and Neigh parts often consume over 90% of the total computational effort, using the GPU package approach is not as bad as you might think without knowing the details of the implementation.

It takes very little time. The reason to port such fixes to KOKKOS is not the acceleration, but the cost of having to move data between CPU and GPU back and forth.

akohlmey · September 26, 2023, 10:18pm

Please also see the chapter on acceleration in the LAMMPS manual, specifically: 7.5. Comparison of various accelerator packages — LAMMPS documentation

Gabriele · September 26, 2023, 10:43pm

Sounds good, will give the libgpu a try for fix temp/rescale calculations the user is planning.