Porting inhouse potential to kokkos follow up

Hi all, I have done converting kokkos supported potential files, but before I test it. I want to double check a few things. Also thank you Stan and Axel, the video on LAMMPS master class for kokkos is extremely helpful.

First, for the system that our group is modeling it requires three pair styles overlay:

pair_style hybrid/overlay  12lj/cut/coul/long 8.0 8.0 bv 2.0 8.0 bvv 2.0 8.0

That shouldn’t affect the acceleration from using the kokkos version of all three styles right? I could also use the kokkos version for two styles, while using the regular potential for the other one to gain some acceleration?

Second, the bond valence pair style (the one I am working on) is really similar to EAM pair style, in which we use the density on each atom to compute forces, energy, and virial tensor components in EAM, whereas we use bond valence on each atom to compute forces, energy, and virial tensors components in bond valence potential. The major difference is that variables in EAM are initialized from reading a file, and variables in BV are initialized from input arguments, which is similar to pair_lj_cut, so I used pair_lj_cut_kokkos files also as some guidance.

I want to double check on my syntax when initializing, and using these variables(members in params_bv stucture, energy0, and cutsq)using kokkos.
Starting off with: a related snippet of pair_bv_kokkos.h :

public:
 
 struct params_bv{
    KOKKOS_INLINE_FUNCTION
    params_bv() {cut=0;cutsq=0;r0=0;alpha=0;sparam=0;v0=0;offset=0;};
    KOKKOS_INLINE_FUNCTION
    params_bv(int /*i*/) {cut=0;cutsq=0;r0=0;alpha=0;sparam=0;v0=0;offset=0;};
    F_FLOAT cut,cutsq,r0,alpha,sparam,v0,offset;
  }

 protected:

  Kokkos::DualView<params_bv**,Kokkos::LayoutRight,DeviceType> k_params;
  //params is the unmanaged/device view of the dual view
  typename Kokkos::DualView<params_bv**,Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
  //m_params is an instance of params_bv stucture 
  params_bv m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];  
  
  DAT::tdual_ffloat_1d k_energy0;
  typename AT::t_ffloat_1d d_energy0;
  F_FLOAT m_energy0[MAX_TYPES_STACKPARAMS+1];
  
  F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
  typename AT::tdual_ffloat_2d k_cutsq;
  typename AT::t_ffloat_2d d_cutsq;

allocate function:

template<class DeviceType>
void PairBVKokkos<DeviceType>::allocate()
{
  PairBV::allocate();
  int n = atom->ntypes;
  memory->destroy(cutsq);
  memoryKK->create_kokkos(k_cutsq,cutsq,n+1,n+1,"pair:cutsq");
  d_cutsq = k_cutsq.template view<DeviceType>();
    
  k_params = Kokkos::DualView<params_bv**,Kokkos::LayoutRight,DeviceType>("PairLJCut::params",n+1,n+1);
  params = k_params.template view<DeviceType>();
    
  k_energy0 = DAT::tdual_ffloat_1d("pair:energy0",n+1);
  d_energy0 = k_energy0.template view<DeviceType>();

}

init_one function

template<class DeviceType>
double PairBVKokkos<DeviceType>::init_one(int i, int j)
{
  double cutone = PairBV::init_one(i,j);

  k_params.h_view(i,j).r0 = r0[i][j];
  k_params.h_view(i,j).alpha = alpha[i][j];
  k_params.h_view(i,j).sparam = sparam[i][j];
  k_params.h_view(i,j).v0 = v0[i][j];
  k_params.h_view(i,j).offset = offset[i][j];
  k_params.h_view(i,j).cutsq = cutone*cutone;
  k_params.h_view(j,i) = k_params.h_view(i,j);
  if (i<MAX_TYPES_STACKPARAMS+1 && j<MAX_TYPES_STACKPARAMS+1) {
    m_params[i][j] = m_params[j][i] = k_params.h_view(i,j);
    m_cutsq[j][i] = m_cutsq[i][j] = cutone*cutone;
  }
  if (i==j){
    k_energy0.h_view(i) = energy0[i];
  }

  k_cutsq.h_view(i,j) = k_cutsq.h_view(j,i) = cutone*cutone;
  k_cutsq.template modify<LMPHostType>();
  k_params.template modify<LMPHostType>();
  k_energy0.template modify<LMPHostType>();

  return cutone;
}

snippet of the compute function :

  atomKK->sync(execution_space,datamask_read);
  k_cutsq.template sync<DeviceType>();
  k_params.template sync<DeviceType>();
  k_energy0.template sync<DeviceType>();

snippet of an operator on how I use these variables:

    const F_FLOAT rsq = delx*delx + dely*dely + delz*delz;
    
    //STACK memory can be faster too many types falls back to global memory
    if((STACKPARAMS?m_params[itype][jtype].alpha:params(itype,jtype).alpha)!=0.0){
        if (rsq < (STACKPARAMS?m_cutsq[itype][jtype]:d_cutsq(itype,jtype))) {
          F_FLOAT recip = 1.0/sqrt(rsq);
          F_FLOAT r = sqrt(rsq);
          s0tmp += pow((STACKPARAMS?m_params[itype][jtype].r0:params(itype,jtype).r0)/r,(STACKPARAMS?m_params[itype][jtype].alpha:params(itype,jtype).alpha));
          if (NEWTON_PAIR || j < nlocal) {
            a_s0[j] += pow((STACKPARAMS?m_params[jtype][itype].r0:params(jtype,itype).r0)/r,(STACKPARAMS?m_params[jtype][itype].alpha:params(jtype,itype).alpha));
          }
        }
    }

the snippet of a different operator on how I use these variables:

  F_FLOAT s = d_s0[i] - (STACKPARAMS?m_params[itype][itype].v0:params(itype,itype).v0);
  F_FLOAT ss = s*s;
  d_fp[i] = (STACKPARAMS?m_params[itype][itype].sparam:params(itype,itype).sparam)*power_global*s;
  if (EFLAG) {
    F_FLOAT phi = (STACKPARAMS?m_params[itype][itype].sparam:params(itype,itype).sparam)*ss+(STACKPARAMS?m_energy0[itype]:d_energy0(itype));
    if (eflag_global) ev.evdwl += phi;
    if (eflag_atom) d_eatom[i] += phi;
  }

Could you let me know whether I have initialized and used variables in params_bv stucture, energy0, and cutsq correctly? Thank you!

Sorry I was out of office all last week. The KOKKOS package fully supports pair_style hybrid. There will be a little overhead for lightweight potentials/small systems because pair_style hybrid requires multiple kernel launches and multiple loops to compute the force, in which some data like the neighbor lists are read multiple times. However this overhead is likely small for most cases.

STACKPARAMS uses stack memory which is a limited resource of GPUs, so for manybody potentials with many parameters, we don’t typically use it because it won’t fit. However, if your model only has a few parameters then it should be fine and could give a little speedup vs Kokkos views in global memory. It looks like you are on the right track. It is hard to tell from the snippet, but if applicable, remember to use atomics or a ScatterView if both thread i and j are writing to the same index, e.g. for forces with a half neighbor list.

Let us know if you have more specific questions. Otherwise I’d keep going and try to compile the new code, and then run it.

1 Like

Thanks! I have compiled and ran successfully, but results are a bit different from the CPUs I will make another post about it.

1 Like