Modifying the Kokkos package to support in house potential

Hi, I have been working on modifying our in house potential called bond valence model to be compatible with kokkos package. This in house potential requires two nested loops, the first nested loop(iterating over all the local atoms, for which then iterates over its neighbor atoms) helps us determine the bond valence for each atom. The second nested loop using the predetermined bond valence of the local and neighbor atom as well as the distance between the local and neighbor atom to determine their pairwise force, update the virial tensor, and update the energy.

However, the kokkos package does not support this. So I modified the some parts of the pair_kokkos.h file:

The first thing I have modified is adding a new functor structure named ‘PairComputeFunctorBv’ in which has the compute functions that are suitable to calculating the bond valence as well as pairwise forces, energy, and virial from the calculated bond valence. I have also adjusted alias definitions, constructor, and contribute function within the structure accordingly.

The second thing I have done is to change some part of the pair_compute_neighlist function.
Here is the relevant part in the pair_compute_neighlist function:

 if (fpair->atom->ntypes > MAX_TYPES_STACKPARAMS) {
        PairComputeFunctor<PairStyle,NEIGHFLAG,false,ZEROFLAG,Specialisation > ff(fpair,list);
        GetMaxTeamSize<typename PairStyle::device_type>(ff, inum, teamsize_max_for, teamsize_max_reduce);
      } else {
        PairComputeFunctor<PairStyle,NEIGHFLAG,true,ZEROFLAG,Specialisation > ff(fpair,list);
        GetMaxTeamSize<typename PairStyle::device_type>(ff, inum, teamsize_max_for, teamsize_max_reduce);
      }
    }

    int teamsize_max = teamsize_max_for;
    if (fpair->eflag || fpair->vflag)
      teamsize_max = teamsize_max_reduce;
    atoms_per_team = teamsize_max/vectorsize;
#else
    vectorsize = 1;
    atoms_per_team = 1;
#endif

    const int num_teams = inum / atoms_per_team + (inum % atoms_per_team ? 1 : 0);

    if (fpair->atom->ntypes > MAX_TYPES_STACKPARAMS) {
      PairBvComputeFunctor<PairStyle,NEIGHFLAG,false,ZEROFLAG,Specialisation > ff(fpair,list);
      Kokkos::TeamPolicy<typename PairStyle::device_type,Kokkos::IndexType<int> > policy(num_teams,atoms_per_team,vectorsize);
      if (fpair->eflag || fpair->vflag) Kokkos::parallel_reduce(policy,ff,ev);
      else                              Kokkos::parallel_for(policy,ff);
      ff.contribute();
    } else {
      PairBvComputeFunctor<PairStyle,NEIGHFLAG,true,ZEROFLAG,Specialisation > ff(fpair,list);
      Kokkos::TeamPolicy<typename PairStyle::device_type,Kokkos::IndexType<int> > policy(num_teams,atoms_per_team,vectorsize);
      if (fpair->eflag || fpair->vflag) Kokkos::parallel_reduce(policy,ff,ev);
      else                              Kokkos::parallel_for(policy,ff);
      ff.contribute();
    }
  } else {
    if (fpair->atom->ntypes > MAX_TYPES_STACKPARAMS) {
      PairBvComputeFunctor<PairStyle,NEIGHFLAG,false,ZEROFLAG,Specialisation > ff(fpair,list);
      if (fpair->eflag || fpair->vflag) Kokkos::parallel_reduce(inum,ff,ev);
      else                              Kokkos::parallel_for(inum,ff);
      ff.contribute();
    } else {
      PairBvComputeFunctor<PairStyle,NEIGHFLAG,true,ZEROFLAG,Specialisation > ff(fpair,list);
      if (fpair->eflag || fpair->vflag) Kokkos::parallel_reduce(inum,ff,ev);
      else                              Kokkos::parallel_for(inum,ff);
      ff.contribute();
    }
  }

I used the original functor structure to initialize an instance of that structure that could be used to get the value for teamsize_max_for and teamsize_max_reduce, which helped me to determine the nums_team, atoms_per_team, and vectorsize. These help me to obtain the policy for parallel processing. But, the instance of the functor structure that I passed in as arguments for parallel_for and parallel_reduce to do the computation are from the PairComputeFunctorBv structure, which has the desired compute functions within it.

Those are the major changes I made on pair_kokkos.h file in order to accelerate my inhouse potential. I want to ask for some of your advice and cautions before I go straight to testing it.

Pair_kokkos.h is meant for simple pair-wise potentials. When a pair style doesn’t fit into the abstractions in pair_kokkos.h, then it needs to have its own parallel dispatch instead. Here is a good example:

This is also done for all the manybody potentials like EAM, Tersoff, ReaxFF, and SNAP, which don’t fit into pair_kokkos.h.

So modifying pair_kokkos.h would not be a viable path forward to getting the in house potential released in public LAMMPS. Kokkos supports nested loops, though parallelizing over neighbors in addition to atoms may have overheads. Happy to discuss more if you have specific questions.

1 Like

Thank you so much! I will see how these complex potentials do it.

With regard to parallelizing over neighbors in addition to atoms, I did not express my idea clearly. The two nested loop that I talked about is just like the following:

    Kokkos::parallel_for(Kokkos::TeamThreadRange(team, firstatom, lastatom), [&] (const int &ii) {

      const int i = list.d_ilist[ii];
      const X_FLOAT xtmp = c.x(i,0);
      const X_FLOAT ytmp = c.x(i,1);
      const X_FLOAT ztmp = c.x(i,2);
      const int itype = c.type(i);

      if (NEIGHFLAG == FULL && ZEROFLAG) {
        Kokkos::single(Kokkos::PerThread(team), [&] (){
          f(i,0) = 0.0;
          f(i,1) = 0.0;
          f(i,2) = 0.0;
        });
      }

      const AtomNeighborsConst neighbors_i = list.get_neighbors_const(i);
      const int jnum = list.d_numneigh[i];

      t_scalar3<double> fsum;

      Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,jnum),
        [&] (const int jj, t_scalar3<double>& ftmp) {

        int j = neighbors_i(jj);
        const F_FLOAT factor_lj = c.special_lj[sbmask(j)];
        j &= NEIGHMASK;
        const X_FLOAT delx = xtmp - c.x(j,0);
        const X_FLOAT dely = ytmp - c.x(j,1);
        const X_FLOAT delz = ztmp - c.x(j,2);
        const int jtype = c.type(j);
        const F_FLOAT rsq = delx*delx + dely*dely + delz*delz;

        if (rsq < (STACKPARAMS?c.m_cutsq[itype][jtype]:c.d_cutsq(itype,jtype))) {

          const F_FLOAT fpair = factor_lj*c.template compute_fpair<STACKPARAMS,Specialisation>(rsq,i,j,itype,jtype);

          const F_FLOAT fx = delx*fpair;
          const F_FLOAT fy = dely*fpair;
          const F_FLOAT fz = delz*fpair;

          ftmp.x += fx;
          ftmp.y += fy;
          ftmp.z += fz;

          if ((NEIGHFLAG == HALF || NEIGHFLAG == HALFTHREAD) && j < c.nlocal) {
            a_f(j,0) -= fx;
            a_f(j,1) -= fy;
            a_f(j,2) -= fz;
          }
        }

      },fsum);

      Kokkos::single(Kokkos::PerThread(team), [&] () {
        a_f(i,0) += fsum.x;
        a_f(i,1) += fsum.y;
        a_f(i,2) += fsum.z;
      });

    });

But in one we are computing a quantity other than forces, and in another using our computed quantity to compute the force, energy, and virial components.

If you have not done so, you may want to check out Stan’s lectures at the recent workshop that discuss KOKKOS styles in LAMMPS in depth. https://www.youtube.com/watch?v=MBOO6OYoy2A

2 Likes