GPU kernel call structure in LAMMPS

I am trying to understand the way that GPU kernels are called. For example, I want to know where in the code, the kernel k_charmm_long is called. As I grep that word, I see the following implementation in ./lib/gpu/lal_charmm_long.cu

__kernel void k_charmm_long...

and a call like this:

  success=this->init_atomic(nlocal,nall,max_nbors,maxspecial,cell_size,gpu_split,
                            _screen,charmm_long,"k_charmm_long");

in ./lib/gpu/lal_charmm_long.cpp

Is that correct? I wonder how grid and block sizes are passed to the kernel with the traditional syntax like k_charmm_long<<<g, b>>>()? I want to know that in order to inject some profiling calls before and after kernel launch.

Any idea about that?

Have you read the paper(s) describing the GPU package implementation in LAMMPS?

There you should get an explanation of the general architecture used in the GPU package and more specifically refer to functions and macros in the lib/gpu/geryon folder which are used as an abstraction for (originally) CUDA and OpenCL (and later also for HIP on AMD GPUs). There is also a README file in that folder with some basic explanation, but the links in there are likely outdated since the author has since moved on.