A few KOKKOS development questions

(1) what’s the difference between

atomKK->k_type.template view();
atomKK->k_type.view();

(2) why is the following commented out in angle_harmonic_kokkos.cpp (@stamoor was the author do you remember ?)

//atomKK->sync(execution_space,datamask_read);
k_k.template sync();
k_theta0.template sync();
// if (eflag || vflag) atomKK->modified(execution_space,datamask_modify);
// else atomKK->modified(execution_space,F_MASK);

(3) what is KKDeviceType ? deprecated old way of writing DeviceType ??

(4) what’s the difference if any between

DeviceType().fence();
Kokkos::fence();

  1. No difference in practice, but sometimes “template” is required by c++ and sometimes it is not (I think it depends on if it is a member of that class or another class). If your code won’t compile then try adding it :grin:
  2. It has to do with overlapping host/device in verlet_kokkos.cpp. For this reason, all topology styles (bond, angle, etc.) must set DATAMASK_READ, DATAMASK_MODIFY in the constructor and must not use atomKK->sync/modified. This is a gotcha that needed to be better documented.
  3. KKDeviceType is required for UVM, and is not the same as DeviceType. KKDeviceType uses device memory space in the host execution space, see lammps/src/KOKKOS/kokkos_type.h at 0f13f632e25b23871595195da7cf162f1a3c274e · lammps/lammps · GitHub.
  4. Fences are only needed in a few special places. So almost always: don’t add any fences to your Kokkos code (it already fences after every kernel launch). I believe DeviceType().fence() only fences that DeviceType, while Kokkos::fence() fences everything (is stronger).

Let us know if you have specific questions or need more info.

are pair styles included in “topology styles”, ie. must not use atomKK->sync/modified in pair styles ? or only bond/angle/dihedral ??

No pair styles are not included.

(5) what is the purpose of c_x in pair styles ? i cant find anywhere it’s actually used. it’s only declared in .h and initialized in .cpp. compiler optimizes it out maybe ??

src % grep -r c_x *                
AMOEBA/pair_amoeba.h:  void pbc_xred();
AMOEBA/pair_amoeba.cpp:  if (amoeba) pbc_xred();
AMOEBA/pair_amoeba.cpp:void PairAmoeba::pbc_xred()
KOKKOS/pair_coul_debye_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_class2_coul_cut_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_buck_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_soft_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_yukawa_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_cut_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_cut_coul_debye_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_coul_debye_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_charmm_coul_charmm_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_coul_long_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_charmm_coul_charmm_implicit_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_coul_cut_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_spica_coul_long_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_class2_coul_cut_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_cut_coul_cut_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_class2_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_yukawa_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_dpd_fdt_energy_kokkos.h:  typename ArrayTypes<DeviceType>::t_x_array c_x;
KOKKOS/pair_lj_cut_dipole_cut_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_expand_coul_long_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_cut_coul_dsf_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_charmm_coul_long_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_charmm_coul_charmm_implicit_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_soft_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_cut_coul_cut_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_cut_coul_long_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_spica_coul_long_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_expand_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_class2_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_buck_coul_cut_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_table_kokkos.h:  typename AT::t_x_array_const c_x;
KOKKOS/pair_buck_coul_long_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_cut_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_class2_coul_long_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_gran_hooke_history_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_cut_dipole_cut_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_expand_coul_long_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_cut_coul_debye_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_morse_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_spica_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_buck_coul_long_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_buck_coul_cut_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_cut_coul_dsf_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_buck_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_gromacs_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_yukawa_colloid_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_charmm_coul_charmm_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_coul_long_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_coul_cut_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_charmm_coul_long_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_morse_kokkos.h:  typename ArrayTypes<DeviceType>::t_x_array c_x;
KOKKOS/pair_lj_charmmfsw_coul_long_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_gromacs_coul_gromacs_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_gran_hooke_history_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_charmmfsw_coul_long_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_yukawa_colloid_kokkos.h:  typename AT::t_x_array c_x;
KOKKOS/pair_lj_gromacs_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_spica_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_table_kokkos.cpp:  x = c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_class2_coul_long_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_cut_coul_long_kokkos.cpp:  c_x = atomKK->k_x.view<DeviceType>();
KOKKOS/pair_lj_expand_kokkos.h:  typename AT::t_x_array c_x;
MISC/fix_accelerate_cos.cpp:  double massone, force_x, acc_x;
MISC/fix_accelerate_cos.cpp:      acc_x = acceleration *
MISC/fix_accelerate_cos.cpp:      force_x = acc_x * massone * force->mvv2e;
min_hftn.cpp:  double  dXInf = calc_xinf_using_mpi_();
min_hftn.cpp:   Private method calc_xinf_using_mpi_
min_hftn.cpp:double MinHFTN::calc_xinf_using_mpi_() const
min_hftn.h:  double calc_xinf_using_mpi_() const;

It is supposed to be a “constant” (read-only) version of the positions, but looks like it isn’t used anywhere.

a few more questions…


(5) why do some KOKKOS fix classes also inherit from KokkosBase (eg. FixShakeKokkos) while others (eg. FixGravityKokkos) do not ? only difference i see are the methods pack_forward_comm_fix_kokkos, unpack_forward_comm_fix_kokkos, pack_exchange_kokkos, unpack_exchange_kokkos.

how is this multiple inheritance not causing problems with vtables on cuda ?


(6) what is a zero functor ?

//if(k_eatom.extent(0)<maxeatom) { // won't work without adding zero functor

(7) why is cuda unified memory being used in FixQEqReaxFFKokkos<DeviceType>::pack_exchange_kokkos() ?

d_buf = typename ArrayTypes<DeviceType>::t_xfloat_1d_um(
  k_buf.template view<DeviceType>().data(),
  k_buf.extent(0)*k_buf.extent(1)
);

(8) how to test pack_exchange_kokkos() ? when does it get called ?? how to modify an example to force it to happen ???


(9) in pack_exchange_kokkos(), what’s the difference between k_exchange_sendlist and k_copylist ?


(10) in unpack_exchange_kokkos(), what are the differences between nrecv, nrecv1, and nextrarecv1 ?


(11) why is there no static_cast<double> in FixShakeKokkos<DeviceType>::pack_exchange_item() but there is static_cast<tagint> in unpack_exchange_kokkos() ? in other words, why do we need to cast one way but not the other ??


Wow good questions:

  1. We need a way to call functions from Kokkos pair styles or fixes in the CommKokkos class without having to cast to every individual style, so we inherit from a common base class. If a fix doesn’t implement communication routines then it doesn’t need to inherit from KokkosBase. These functions are all called from the host CPU, so there is no issue with vtables on the GPU. If you try that with device functions then there are issues with the vtable.

  2. That comment is outdated, it just needs to use Kokkos::deep_copy(d_view,0.0) to zero out the values. This is done by default when a view is reallocated, but reallocating every timestep is expensive, so using the deep_copy would be better.

  3. It is not unified memory, but an “unmanaged” view, which doesn’t do reference counting leading to reduced overhead when copying the view.

  4. You can use -pk kokkos comm/exchange device to force it on on the CPU (e.g. for Serial or OpenMP), see package command — LAMMPS documentation.

  5. Sendlist is a list of atoms that are migrating off the current rank to another rank. Copylist is a list of particles that are copied from one place in the view to another to backfill “holes” left by particles that migrated away, so the array remains compact.

  6. It has to do with the different passes in the unpack. Sometimes you only have 1 pass, sometimes you have more than 1, depending on the neighboring proc grid. “Extra” data is carried by fixes so that needs to be communicated too.

  7. I think I just duplicated the original CPU code which is also inconsistent. I don’t think the static casts are strictly necessary, it will implicit cast. More correct would be to use a union with ubuf as is done in the atom_vec styles, so that no precision is lost for converting a huge 64-bit integer to a double.