@akohlmey once said:
Also, the standard, half-neighbor list version of a pairwise potential, has a race condition when updating forces, energies, and virial in parallel, so you may want to use a version with a full neighbor list to avoid the race with the force update. The race due to calling ev_tally() within each inner loop iteration still needs to be addressed.
im using pair lj/charmmfsw/coul/long/kk
which cannot be used with a full neighbor list:
ERROR: Dihedral_style charmm/kk requires half neighbor list (src/KOKKOS/dihedral_charmmfsw_kokkos.cpp:83)
in his recent KOKKOS youtube class, @stamoor said that when there are small numerical differences from run to run on GPU then it can be caused by a race condition. This is happening not only with pair lj/charmmfsw/coul/long/kk
(every run has a relative error ~1e-3 compared to previous ones while the serial runs are always the same). This is also happening with other pair styles examples i ran with KOKKOS (eg. pair lj/cut/coul/long/kk
in examples/rdf-adf/in.spce) so my hypothesis this is caused by the race condition in half neighbor list.
i cannot use KOKKOS for production yet until this numerical error issue is solved so im willing to pitch in.
any suggestion(s) how to tackle this problem ? as @akohlmey said “The race due to calling ev_tally() within each inner loop iteration still needs to be addressed.” so suggestion(s) how to address it…