Question on inter-processor communication

Dear all,
I have a question about communication of forces/torques between processors.
I have implemented my own pair style that calculates a torque on a particle. The interaction uses a full neighbor list and the torque is computed for each atom, i.e. it uses only the ghost atoms position and velocity to calculate interaction for my ‘real’ particle in processor, and no update is made to the ghost atom torque (which will be computed by the processor where it is a ‘real’ particle). In essence I have implemented

inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
i = ilist[ii];
for (ii = 0; ii < inum; ii++) {
     j = jlist[jj];
     for (jj = 0; jj < jnum; jj++) {
       //calculate interaction based on r_ij and v_ij
     //update torque[i][:];

I have not made any updates to torque[j], use full neighbor lists and do not have any processor communication in my pair_style. I want to know if the inter-processor communication might end up sending something nevertheless to the ghost atoms (in force.cpp/verlet.cpp)? Or am i ‘safe’? My newton is ‘on’ because I use LJ interactions which will require force communication between atoms… Any help will be useful.


added: To test, I ran the same simulation on 1 core vs 4 cores and I see that small changes seem to accumulate which cascade to give different particle positions. Although I only care about ensemble averages in the end, I still want to make sure there is nothing wrong in my implementation… …

Is your new pair style a simple 2-body interaction (like Lennard-Jones), or a many-body interaction (like Tersoff)? LAMMPS automatically takes care of summing torques on ghost atoms back to owned atoms every timestep, which is called “reverse” communication, you don’t need to do this in your pair style. But this is also just summing zeros when using a “full” neighbor list and newton on since you don’t update the ghost atoms.

If it is simple 2-body, then what you wrote should work, we use something like this for the Kokkos package, but it will double the computation and will probably be slow for CPUs. Using a half neighbor list will likely be faster.

If it is many-body like Tersoff or SNAP, then it is more complex, and what you have will not work.

It is normal to see trajectories diverging for different numbers of MPI ranks after time.

This statement makes no sense. You can run LJ interactions with both settings, newton on or newton off. Only in the case of newton on there will be a reverse communication. And that is independent from whether you have a full or a half neighbor list. There is no forward communication for forces or torques.

There are two possible sources for these accumulated divergence: 1) floating-point truncation issues due to the non-associative nature of floating point math (i.e. the exact total force and torque on each atom depends on the order in which it is different with different domain decomposition and 2) inconsistent computation of forces depending on the order for the two atoms in each pair. Only you are in the position to tell whether either or both apply and to what extent. Also, you can easily set up a simulation of a similar size with a different pair style that also computes forces and torques, and compare the divergence with number of processors there to yours.