Modifying the Kokkos package to support in house potential

Hi, I have been working on modifying our in house potential called bond valence model to be compatible with kokkos package. This in house potential requires two nested loops, the first nested loop(iterating over all the local atoms, for which then iterates over its neighbor atoms) helps us determine the bond valence for each atom. The second nested loop using the predetermined bond valence of the local and neighbor atom as well as the distance between the local and neighbor atom to determine their pairwise force, update the virial tensor, and update the energy.

However, the kokkos package does not support this. So I modified the some parts of the pair_kokkos.h file:

The first thing I have modified is adding a new functor structure named ‘PairComputeFunctorBv’ in which has the compute functions that are suitable to calculating the bond valence as well as pairwise forces, energy, and virial from the calculated bond valence. I have also adjusted alias definitions, constructor, and contribute function within the structure accordingly.

The second thing I have done is to change some part of the pair_compute_neighlist function.
Here is the relevant part in the pair_compute_neighlist function:

 if (fpair->atom->ntypes > MAX_TYPES_STACKPARAMS) {
        PairComputeFunctor<PairStyle,NEIGHFLAG,false,ZEROFLAG,Specialisation > ff(fpair,list);
        GetMaxTeamSize<typename PairStyle::device_type>(ff, inum, teamsize_max_for, teamsize_max_reduce);
      } else {
        PairComputeFunctor<PairStyle,NEIGHFLAG,true,ZEROFLAG,Specialisation > ff(fpair,list);
        GetMaxTeamSize<typename PairStyle::device_type>(ff, inum, teamsize_max_for, teamsize_max_reduce);
      }
    }

    int teamsize_max = teamsize_max_for;
    if (fpair->eflag || fpair->vflag)
      teamsize_max = teamsize_max_reduce;
    atoms_per_team = teamsize_max/vectorsize;
#else
    vectorsize = 1;
    atoms_per_team = 1;
#endif

    const int num_teams = inum / atoms_per_team + (inum % atoms_per_team ? 1 : 0);

    if (fpair->atom->ntypes > MAX_TYPES_STACKPARAMS) {
      PairBvComputeFunctor<PairStyle,NEIGHFLAG,false,ZEROFLAG,Specialisation > ff(fpair,list);
      Kokkos::TeamPolicy<typename PairStyle::device_type,Kokkos::IndexType<int> > policy(num_teams,atoms_per_team,vectorsize);
      if (fpair->eflag || fpair->vflag) Kokkos::parallel_reduce(policy,ff,ev);
      else                              Kokkos::parallel_for(policy,ff);
      ff.contribute();
    } else {
      PairBvComputeFunctor<PairStyle,NEIGHFLAG,true,ZEROFLAG,Specialisation > ff(fpair,list);
      Kokkos::TeamPolicy<typename PairStyle::device_type,Kokkos::IndexType<int> > policy(num_teams,atoms_per_team,vectorsize);
      if (fpair->eflag || fpair->vflag) Kokkos::parallel_reduce(policy,ff,ev);
      else                              Kokkos::parallel_for(policy,ff);
      ff.contribute();
    }
  } else {
    if (fpair->atom->ntypes > MAX_TYPES_STACKPARAMS) {
      PairBvComputeFunctor<PairStyle,NEIGHFLAG,false,ZEROFLAG,Specialisation > ff(fpair,list);
      if (fpair->eflag || fpair->vflag) Kokkos::parallel_reduce(inum,ff,ev);
      else                              Kokkos::parallel_for(inum,ff);
      ff.contribute();
    } else {
      PairBvComputeFunctor<PairStyle,NEIGHFLAG,true,ZEROFLAG,Specialisation > ff(fpair,list);
      if (fpair->eflag || fpair->vflag) Kokkos::parallel_reduce(inum,ff,ev);
      else                              Kokkos::parallel_for(inum,ff);
      ff.contribute();
    }
  }

I used the original functor structure to initialize an instance of that structure that could be used to get the value for teamsize_max_for and teamsize_max_reduce, which helped me to determine the nums_team, atoms_per_team, and vectorsize. These help me to obtain the policy for parallel processing. But, the instance of the functor structure that I passed in as arguments for parallel_for and parallel_reduce to do the computation are from the PairComputeFunctorBv structure, which has the desired compute functions within it.

Those are the major changes I made on pair_kokkos.h file in order to accelerate my inhouse potential. I want to ask for some of your advice and cautions before I go straight to testing it.

Pair_kokkos.h is meant for simple pair-wise potentials. When a pair style doesn’t fit into the abstractions in pair_kokkos.h, then it needs to have its own parallel dispatch instead. Here is a good example:

This is also done for all the manybody potentials like EAM, Tersoff, ReaxFF, and SNAP, which don’t fit into pair_kokkos.h.

So modifying pair_kokkos.h would not be a viable path forward to getting the in house potential released in public LAMMPS. Kokkos supports nested loops, though parallelizing over neighbors in addition to atoms may have overheads. Happy to discuss more if you have specific questions.

1 Like

Thank you so much! I will see how these complex potentials do it.

With regard to parallelizing over neighbors in addition to atoms, I did not express my idea clearly. The two nested loop that I talked about is just like the following:

    Kokkos::parallel_for(Kokkos::TeamThreadRange(team, firstatom, lastatom), [&] (const int &ii) {

      const int i = list.d_ilist[ii];
      const X_FLOAT xtmp = c.x(i,0);
      const X_FLOAT ytmp = c.x(i,1);
      const X_FLOAT ztmp = c.x(i,2);
      const int itype = c.type(i);

      if (NEIGHFLAG == FULL && ZEROFLAG) {
        Kokkos::single(Kokkos::PerThread(team), [&] (){
          f(i,0) = 0.0;
          f(i,1) = 0.0;
          f(i,2) = 0.0;
        });
      }

      const AtomNeighborsConst neighbors_i = list.get_neighbors_const(i);
      const int jnum = list.d_numneigh[i];

      t_scalar3<double> fsum;

      Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,jnum),
        [&] (const int jj, t_scalar3<double>& ftmp) {

        int j = neighbors_i(jj);
        const F_FLOAT factor_lj = c.special_lj[sbmask(j)];
        j &= NEIGHMASK;
        const X_FLOAT delx = xtmp - c.x(j,0);
        const X_FLOAT dely = ytmp - c.x(j,1);
        const X_FLOAT delz = ztmp - c.x(j,2);
        const int jtype = c.type(j);
        const F_FLOAT rsq = delx*delx + dely*dely + delz*delz;

        if (rsq < (STACKPARAMS?c.m_cutsq[itype][jtype]:c.d_cutsq(itype,jtype))) {

          const F_FLOAT fpair = factor_lj*c.template compute_fpair<STACKPARAMS,Specialisation>(rsq,i,j,itype,jtype);

          const F_FLOAT fx = delx*fpair;
          const F_FLOAT fy = dely*fpair;
          const F_FLOAT fz = delz*fpair;

          ftmp.x += fx;
          ftmp.y += fy;
          ftmp.z += fz;

          if ((NEIGHFLAG == HALF || NEIGHFLAG == HALFTHREAD) && j < c.nlocal) {
            a_f(j,0) -= fx;
            a_f(j,1) -= fy;
            a_f(j,2) -= fz;
          }
        }

      },fsum);

      Kokkos::single(Kokkos::PerThread(team), [&] () {
        a_f(i,0) += fsum.x;
        a_f(i,1) += fsum.y;
        a_f(i,2) += fsum.z;
      });

    });

But in one we are computing a quantity other than forces, and in another using our computed quantity to compute the force, energy, and virial components.

If you have not done so, you may want to check out Stan’s lectures at the recent workshop that discuss KOKKOS styles in LAMMPS in depth. https://www.youtube.com/watch?v=MBOO6OYoy2A

2 Likes

The video was really helpful. Thank you Stan and Axel! I have finished converting kokkos supported potential files, but before I test it at Perlmutter, I want to double check a few things. The bond valence pair style (the one I am working on) is really similar to EAM pair style: we use the density on each atom to compute forces, energy, and virial tensor components in EAM, whereas we use bond valence on each atom to compute forces, energy, and virial tensors components in bond valence potential. The major difference is that variables in EAM are initialized from reading a file, and variables in BV are initialized from input arguments, which is similar to pair_lj_cut, so I used pair_lj_cut_kokkos files also as some guidance.

I want to double check on my syntax when initializing, and using these variables(members in params_bv stucture, energy0, and cutsq)using kokkos.
Starting off with: a related snippet of pair_bv_kokkos.h :

public:
 
 struct params_bv{
    KOKKOS_INLINE_FUNCTION
    params_bv() {cut=0;cutsq=0;r0=0;alpha=0;sparam=0;v0=0;offset=0;};
    KOKKOS_INLINE_FUNCTION
    params_bv(int /*i*/) {cut=0;cutsq=0;r0=0;alpha=0;sparam=0;v0=0;offset=0;};
    F_FLOAT cut,cutsq,r0,alpha,sparam,v0,offset;
  }

 protected:

  Kokkos::DualView<params_bv**,Kokkos::LayoutRight,DeviceType> k_params;
  //params is the unmanaged/device view of the dual view
  typename Kokkos::DualView<params_bv**,Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
  //m_params is an instance of params_bv stucture 
  params_bv m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];  
  
  DAT::tdual_ffloat_1d k_energy0;
  typename AT::t_ffloat_1d d_energy0;
  F_FLOAT m_energy0[MAX_TYPES_STACKPARAMS+1];
  
  F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
  typename AT::tdual_ffloat_2d k_cutsq;
  typename AT::t_ffloat_2d d_cutsq;

allocate function:

template<class DeviceType>
void PairBVKokkos<DeviceType>::allocate()
{
  PairBV::allocate();
  int n = atom->ntypes;
  memory->destroy(cutsq);
  memoryKK->create_kokkos(k_cutsq,cutsq,n+1,n+1,"pair:cutsq");
  d_cutsq = k_cutsq.template view<DeviceType>();
    
  k_params = Kokkos::DualView<params_bv**,Kokkos::LayoutRight,DeviceType>("PairLJCut::params",n+1,n+1);
  params = k_params.template view<DeviceType>();
    
  k_energy0 = DAT::tdual_ffloat_1d("pair:energy0",n+1);
  d_energy0 = k_energy0.template view<DeviceType>();

}

init_one function

template<class DeviceType>
double PairBVKokkos<DeviceType>::init_one(int i, int j)
{
  double cutone = PairBV::init_one(i,j);

  k_params.h_view(i,j).r0 = r0[i][j];
  k_params.h_view(i,j).alpha = alpha[i][j];
  k_params.h_view(i,j).sparam = sparam[i][j];
  k_params.h_view(i,j).v0 = v0[i][j];
  k_params.h_view(i,j).offset = offset[i][j];
  k_params.h_view(i,j).cutsq = cutone*cutone;
  k_params.h_view(j,i) = k_params.h_view(i,j);
  if (i<MAX_TYPES_STACKPARAMS+1 && j<MAX_TYPES_STACKPARAMS+1) {
    m_params[i][j] = m_params[j][i] = k_params.h_view(i,j);
    m_cutsq[j][i] = m_cutsq[i][j] = cutone*cutone;
  }
  if (i==j){
    k_energy0.h_view(i) = energy0[i];
  }

  k_cutsq.h_view(i,j) = k_cutsq.h_view(j,i) = cutone*cutone;
  k_cutsq.template modify<LMPHostType>();
  k_params.template modify<LMPHostType>();
  k_energy0.template modify<LMPHostType>();

  return cutone;
}

snippet of the compute function :

  atomKK->sync(execution_space,datamask_read);
  k_cutsq.template sync<DeviceType>();
  k_params.template sync<DeviceType>();
  k_energy0.template sync<DeviceType>();

snippet of an operator on how I use these variables:

    const F_FLOAT rsq = delx*delx + dely*dely + delz*delz;
    
    //STACK memory can be faster too many types falls back to global memory
    if((STACKPARAMS?m_params[itype][jtype].alpha:params(itype,jtype).alpha)!=0.0){
        if (rsq < (STACKPARAMS?m_cutsq[itype][jtype]:d_cutsq(itype,jtype))) {
          F_FLOAT recip = 1.0/sqrt(rsq);
          F_FLOAT r = sqrt(rsq);
          s0tmp += pow((STACKPARAMS?m_params[itype][jtype].r0:params(itype,jtype).r0)/r,(STACKPARAMS?m_params[itype][jtype].alpha:params(itype,jtype).alpha));
          if (NEWTON_PAIR || j < nlocal) {
            a_s0[j] += pow((STACKPARAMS?m_params[jtype][itype].r0:params(jtype,itype).r0)/r,(STACKPARAMS?m_params[jtype][itype].alpha:params(jtype,itype).alpha));
          }
        }
    }

the snippet of a different operator on how I use these variables:

  F_FLOAT s = d_s0[i] - (STACKPARAMS?m_params[itype][itype].v0:params(itype,itype).v0);
  F_FLOAT ss = s*s;
  d_fp[i] = (STACKPARAMS?m_params[itype][itype].sparam:params(itype,itype).sparam)*power_global*s;
  if (EFLAG) {
    F_FLOAT phi = (STACKPARAMS?m_params[itype][itype].sparam:params(itype,itype).sparam)*ss+(STACKPARAMS?m_energy0[itype]:d_energy0(itype));
    if (eflag_global) ev.evdwl += phi;
    if (eflag_atom) d_eatom[i] += phi;
  }

Could you let me know whether I have initialized and used variables in params_bv stucture, energy0, and cutsq correctly? Thank you!