GPU implementation structure

Yolk · November 14, 2023, 10:59am

Hello,
I would like to write a GPU implementation for lammps. If I start to read some example of GPU implementation I cannot undestand where the GPU implementation is.

If we take for instance the following file:

github.com

lammps/lammps/blob/781eadc9c3206d210565b9cf9ae3b0a31471beb7/src/GPU/pair_dpd_gpu.cpp

/* ----------------------------------------------------------------------
   LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
   https://www.lammps.org/, Sandia National Laboratories
   LAMMPS development team: [email protected]

   Copyright (2003) Sandia Corporation.  Under the terms of Contract
   DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
   certain rights in this software.  This software is distributed under
   the GNU General Public License.

   See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */

/* ----------------------------------------------------------------------
   Contributing author: Trung Dac Nguyen (ORNL)
------------------------------------------------------------------------- */

#include "pair_dpd_gpu.h"

#include "atom.h"

This file has been truncated. show original

I see:
void PairDPDGPU::cpu_compute()
That seems to be the common cpu implementation but is not called using GPU.

The other methods are:
int **dpd_gpu_compute_n
void dpd_gpu_compute

But I cannot find any implementation of those. Could you help me to undestand how does it works and how to write a GPU fix? Where can I find more documentation about GPU implmentations?

Thanks

akohlmey · November 15, 2023, 6:10am

GPU implementation of what?

The crucial hint is in the build instructions, for example here:
https://docs.lammps.org/Build_extras.html#traditional-make

The GPU package has no example for implementing a “fix”. It focuses on accelerating only the part of the computation that benefits the most from accelerating with a GPU which is the neighbor list build and the computation of the pair style. Other force computations happen concurrently on the CPU. Fixes and computes rarely comprise of a significant contribution to the total time, especially for smaller systems. The LAMMPS GPU package allows to oversubscribe GPUs, i.e. attach multiple MPI processes to the same GPU (usually 2-6) to benefit from the availability of multiple times more CPU cores than GPUs on typical GPU compute nodes these days.

If you are looking for a way to have the entire GPU accelerated computation being run on the GPU, you should look at the KOKKOS package instead since that follows a different approach.

There is no detailed documentation about the implementation outside the publications listed by LAMMPS when running a GPU package accelerated calculation and the source code and README files in lib/gpu.