We are trying to accelerate LAMMPS for Rhodopsin benchmark using a single CPU + FPGA system. We find the non-coalesced access of “j” atoms (neighbor atoms) during forces update as a bottleneck and wonder if we could get around that.
If we do not use Newton’s third law i.e. set the “newton” flag to “off” in LAMMPS input file, I would expected LAMMPS to not perform the forces summation for “j” atoms and instead have longer neighbor lists and redundant force calculations. However the conditional statement within pair_lj_charmm_coul_long.cpp::compute() allows forces summation of “j” atoms for “j <nlocal”, irrespective of the “newton” flag setting . Is it possible to completely avoid the following? (may be with some modification in the neighbor list build)
The newton setting only affects I,J pairs where I and J are
on different procs. An I,J pair within a proc sub-domain will
still be stored only once, and thus summed to both I and J.
If you want to get rid of this logic, then I would create
a new pair_style (derived from the old one) and do 2 things:
a) delete the code that sums the force to J
b) have that pair_style request a "full" neighbor
list instead of the default "half"
A full list will ignore Newton completely even for I,J both
on proc. This is the strategy a couple folks have used
for GPU versions of LAMMPS pair styles. The assumption
is that double-computing everything is still faster.
1) Deleted the force summations in PairLjCharmmCoulLong::compute() for
J atoms
2) Removed the virial summations for j atoms in ev_tally()
2.) Added a full neighbor list request instead of half in init_style()
as below
3.) Set Newton flag as "off" in the input script .
A full neighbor list being created, however the final output energy
numbers do not match the ones simulated without the above changes.
Are we missing somthing ? Please let me know.
yes. you need to write a different version of ev_tally(),
i.e. you need ev_tally_full() that is being called for full
neighbor lists on regular pair potentials.
i have written a version of that for playing with many-core cpus,
but it is only mildly tested. i can send it to you (and steve
for inclusion into pair.cpp), if desired.