[lammps-users] filtering ghost atom interactions when updating a per-atom array in new pair style

Hi everyone,

I am trying to implement a new pair style which requires me to keep track of a number of 1D per-atom arrays. I am using an old-ish release, 7Aug19. but I think that should not be an issue given the simplicity of what I’m trying to do.
If it is of interest to anyone, I am trying to code a double-sum implementation of the 3-body Stillinger-Weber potential (pair_sw)
I am using the pair_eam style as a guide to work out how to communicate and update the per-atom arrays that I have defined. I have also gone through the mail list archive but not exhaustively; forgive me if I’ve missed something obvious.

My issue is that atoms of a given “tag” show up multiple times as ghost atom neighbors of a given atom. I need to understand how to filter out/choose and include only the “correct” pairs from these.
These ghost atom neighbors have different distances with respect to the reference (ilist[i]) atom and I find my code computing the interaction between a pair (itag,jtag) multiple times.
I have a self-written code of the pair style where I calculate the interaction due to any pair once using the nearest image distance in a cubic box with PBC. I am using the numbers from that code as a reference for testing.

Details on test:
N=4 (need more than 2 because the potential is a 3-body potential).
run 1 step and observe total energy and forces on each particle
box-length ~2.2163 with a cut-off of ~3.75

Results compared with those from pair_sw on an identical configuration. Checking if total energy and forces on each particle are the same.

I’ve attached a code snippet with (I think) the main details needed. I have a number of per-atom arrays but I’ve written out only two to get the point across
I’ve only included the compute (with allocation of the new arrays), pack/unpack_reverse_comm routines. There is a debug line where I print out for a given i,j, pair (i, j taken from list->ilist and jlist), the tags and the distance rij.
I’ve omitted details specific to how the potential is calculated, as well as other debug lines, expecting that they are not relevant to the issue.

void PairSWDS::compute(int eflag, int vflag)
{

int i,j,k,ii,jj,kk,inum,jnum,jnumm1;
int itype,jtype,ktype,ijparam,ikparam,ijkparam;
tagint itag,jtag;
double xtmp,ytmp,ztmp,delx,dely,delz,evdwl,fpair;
double rsq,rsq1,rsq2,rp,rq;
double delr1[3],delr2[3],fj[3],fk[3];
int *ilist,*jlist,*numneigh,**firstneigh;

evdwl = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
//printf(“computing DS SW\n”);
double **x = atom->x;
double **f = atom->f;
tagint *tag = atom->tag;
int *type = atom->type;
nlocal = atom->nlocal;
nall = nlocal + atom->nghost;
newton_pair = force->newton_pair;
printf(“newton pair is %d\n”,newton_pair);

inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;

memory->destroy(hi);
memory->destroy(si);
memory->create(hi,nall); // oversized if newton_pair=0
memory->create(si,nall);
int looplimit;

if (newton_pair) looplimit=nall;
else looplimit = nlocal;

for(i=0;i<looplimit;i++){
hi[i]=0.0;
si[i]=0.0;

}

for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
//printf(“hi i, i %f %d\n”,hi[i],i);
itag = tag[i];
itype = map[type[i]];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];

jlist = firstneigh[i];
jnum = numneigh[i];

for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
j &= NEIGHMASK;
jtag = tag[j];

if (itag > jtag) {
if ((itag+jtag) 2 == 0) continue; } else if (itag < jtag) { if ((itag+jtag) 2 == 1) continue;
} else {
if (x[j][2] < ztmp) continue;
if (x[j][2] == ztmp && x[j][1] < ytmp) continue;
if (x[j][2] == ztmp && x[j][1] == ytmp && x[j][0] < xtmp) continue;
//if (itag==jtag) continue;
}

jtype = map[type[j]];

delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delxdelx + delydely + delz*delz;

ijparam = elem2param[itype][jtype][jtype];
if (rsq > params[ijparam].cutsq ) continue; // apt cut-off for given pair of atoms, their coeffs
rij=sqrt(rsq);
// Note : debug print below
printf("%d %d i j %d %d itag jtag %f rij \n",i,j,itag,jtag,rij);
hi[i] += exp(rsq);
si[i] += 2rsq;
// when do we update these?
hi[j] += exp(rsq);
si[j] += 2
rsq;
}

}

if (!newton_pair) comm->reverse_comm_pair(this); // comm needed if supplied half-neighbor list with newton_pair=0

for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
//printf(“hi i, i %f %d\n”,hi[i],i);
itag = tag[i];
itype = map[type[i]];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];

jlist = firstneigh[i];
jnum = numneigh[i];

for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
j &= NEIGHMASK;
jtag = tag[j];

if (itag > jtag) {
if ((itag+jtag) 2 == 0) continue; } else if (itag < jtag) { if ((itag+jtag) 2 == 1) continue;
} else {
if (x[j][2] < ztmp) continue;
if (x[j][2] == ztmp && x[j][1] < ytmp) continue;
if (x[j][2] == ztmp && x[j][1] == ytmp && x[j][0] < xtmp) continue;
//if (itag==jtag) continue;
}

jtype = map[type[j]];

delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delxdelx + delydely + delz*delz;

ijparam = elem2param[itype][jtype][jtype];
if (rsq > params[ijparam].cutsq ) continue; // apt cut-off for given pair of atoms, their coeffs

evdwl = 1/rsq + hi[i] - hi[j] + si[i] + si[j];
fijx = (hi[i] - hi[j] + si[i] + si[j])*delx/sqrt(rsq);
fijy = (hi[i] - hi[j] + si[i] + si[j])*dely/sqrt(rsq);
fijz = (hi[i] - hi[j] + si[i] + si[j])*delz/sqrt(rsq);

f[i][0] += fijx;
f[i][1] += fijy;
f[i][2] += fijz;

f[j][0] -= fijx;
f[j][1] -= fijy;
f[j][2] -= fijz;

if (evflag) ev_tally_xyz(i,j,nlocal,newton_pair,evdwl,0.0,
fijx,fijy,fijz,delx,dely,delz);
}
}
}

int PairSWDS::pack_reverse_comm(int n, int first, double buf)
{
int ii,i,m,last;
int count=0;
m = 0; // buffer index counter - buffer entries not indexed with i directly
last = first + n;
for (ii = first; ii < last; ii++) // loops over indices split to the different procs
{
i=ii-first;
buf[m++] = hi[i];
buf[m++] = si[i];
}
// for(i=first;i<last;i++)
// {
// printf(“packing buffer %d %d %f %f\n”,i,i-first,hi[i-first],buf[13
(i-first)]);
// }
return m; // size of the full buffer that is communicated
}

/* ---------------------------------------------------------------------- */

void PairSWDS::unpack_reverse_comm(int n, int *list, double *buf)
{
int ii,i,j,m;

m = 0; // final value is size of communicated buffer - entries in buffer are not indexed with i
printf(“unpacking %d\n”,n);
for (ii = 0; ii < n; ii++) // loops over indices provided from the other processors
{
j = list[ii];
//printf("%d %d %d %f %f\n",i,j,m,buf[m],hi[j]);
hi[j] += buf[m++];
si[j] += buf[m++];

}
}

// output of debug line - N=4 atoms - box size 2.2163 (all lengths Angstrom)

0 1 i j 1 2 itag jtag 2.494940 rij
0 2 i j 1 3 itag jtag 3.534513 rij <<
0 13 i j 1 2 itag jtag 2.435488 rij <<
0 14 i j 1 3 itag jtag 3.590037 rij
0 44 i j 1 3 itag jtag 3.545202 rij
0 45 i j 1 4 itag jtag 2.278151 rij <<
0 49 i j 1 4 itag jtag 2.572599 rij
0 56 i j 1 3 itag jtag 3.600561 rij
1 24 i j 2 1 itag jtag 2.435488 rij <<
1 31 i j 2 4 itag jtag 3.720288 rij
1 0 i j 2 1 itag jtag 2.494940 rij
1 2 i j 2 3 itag jtag 2.589012 rij
1 6 i j 2 3 itag jtag 2.509788 rij <<
1 73 i j 2 4 itag jtag 3.359584 rij <<
1 49 i j 2 4 itag jtag 3.556079 rij
2 108 i j 3 1 itag jtag 3.600561 rij
2 84 i j 3 1 itag jtag 3.545202 rij
2 24 i j 3 1 itag jtag 3.590037 rij
2 27 i j 3 4 itag jtag 2.492355 rij <<
2 9 i j 3 2 itag jtag 2.509788 rij
2 0 i j 3 1 itag jtag 3.534513 rij <<
2 1 i j 3 2 itag jtag 2.589012 rij
2 3 i j 3 4 itag jtag 2.623361 rij
3 92 i j 4 1 itag jtag 2.572599 rij
3 93 i j 4 2 itag jtag 3.556079 rij
3 84 i j 4 1 itag jtag 2.278151 rij <<
3 105 i j 4 2 itag jtag 3.359584 rij <<
3 2 i j 4 3 itag jtag 2.623361 rij
3 21 i j 4 2 itag jtag 3.720288 rij
3 14 i j 4 3 itag jtag 2.492355 rij <<

The lines marked with “<<” are the “correct” computations, i.e., I expect that the others will be filtered out with the use of appropriate “if” conditions to give me the correct result.
Apologies for the long post, I will be grateful for any help.

Best regards,
Yagyik

Hi everyone,

I am trying to implement a new pair style which requires me to keep track of a number of 1D per-atom arrays. I am using an old-ish release, 7Aug19. but I think that should not be an issue given the simplicity of what I’m trying to do.
If it is of interest to anyone, I am trying to code a double-sum implementation of the 3-body Stillinger-Weber potential (pair_sw)
I am using the pair_eam style as a guide to work out how to communicate and update the per-atom arrays that I have defined. I have also gone through the mail list archive but not exhaustively; forgive me if I’ve missed something obvious.

you don’t seem to be realizing the purpose of a reverse communication: it is to collect contributions that were added to ghost atoms for pairs of atoms that straddle subdomain boundaries newton_pair is 1. so if an atom with tag X is a ghost to atoms on multiple subdomains or different periodic replica of atom X are interacting with local atoms, then their contributions must show up multiple times. you also seem to be missing a significant difference between pair style eam and pair style sw. the former requests a “half” neighbor list (where each pair is listed only once. with newton pair on this applies to all atoms globally, with newton pair off, pairs which straddle subdomain boundaries are listed for both subdomains). also, pair style sw does not allow the newton pair off setting.

the neighbor list construction for half neighbor lists will make certain that each pair will show up only once with newton pair on. for newton pair off, the pairs that straddle subdomain boundaries will exist twice, once where each of the two atoms is the local atom, and thus only contributions to the local atom needs to be added. with full neighbor lists this is different, so for the two-body interactions in pair sw, this selection is done inside the loop and thus reducing a full into a half neighbor list, but it only supports the newton pair on.
pair style sw needs this because it requires the full neighbor list for computing the threebody terms.

please also note that you are incorrectly adding values to the “j” atom index unconditionally. those only need to be added if either newton pair is on or atom j is a local atom.
please also note that there is no need for a reverse communication with newton pair off.

on top of that, you seem to be wanting to use the collected interaction information also from ghost atoms when computing the forces. in that case you have to update the corresponding data for all ghost atoms from their corresponding local atoms and that is done with a forward communication.

please also take note of how pair style eam manages the memory and avoids reallocation in every step. there is no harm done in allocating space for a few more atoms (hence the use of atom->nmax) or a need to shrink arrays. they just need to be large enough. on some operating systems the cost of frequent memory allocation and deallocation can become significant.

axel.

one more comment: when comparing to your self-written code, please take into account that for LAMMPS you (usually) don’t have to worry about minimum image conventions. pairwise cutoffs may be larger than half a box for as long as LAMMPS can map the local atoms to ghost atoms. since the ghost atoms are copies there is no problem with having the same atom show up as periodic replica multiple times.

axel.

Thanks very much, Axel, for your reply.

If I understand correctly:
1) I should request a half neighbourlist (modification in init_style()) for the new implementation because I want to do the computation with two loops and each pair appearing once. For this to work, I should also keep newton_pair=1.
2) Fix the update to the "j" atom index conditional on newton_pair=1 or j < nlocal.
3) I intend to calculate the forces on any local atom "i" using the per-atom array values of "i" and neighbouring "j" including ghost atoms. For this to work correctly, the ghost atoms need to have their per-atom arrays updated which requires a foward_comm . If the local atoms on the current sub-domain are ghosts on other domains, I do not want to collect those contributions because I know that all the relevant contributions for an atom "i" are calculated in the sub-domain where it is local. Therefore, no reverse_comm , irrespective of whether newton_pair is off or on.
4) I can improve efficiency by allocating the memory for the per-atom arrays once to a large value, zero them out upto the relevant index (nall/nlocal) at each call to compute() and pack the buffer up to the relevant index in pack/unpack_forward_comm.

As to your point about ghost atoms being copies,

one more comment: when comparing to your self-written code, please take into account that for LAMMPS you (usually) don't have to worry about minimum image conventions. pairwise cutoffs may be larger than half a box for as long as LAMMPS can map the local atoms to ghost atoms. since the ghost atoms are > copies there is no problem with having the same atom show up as periodic replica multiple times.

again, if I understand correctly
1) The pair distance computed in the pair styles (that I have seen) do not use the minimum image convention because the ghost atoms are spawned to serve that purpose across box boundaries and across subdomains.
2) This I'm unsure of, but it seems you're telling me that even if the same pair interaction is computed multiple times for a given local atom and ghosts sharing the same jtag, the force that i calculate will not be wrong because the ghost atoms are copies. However, the total force on the local particle "i" would be different from the case where I treat a given pair interaction only once with rij = MIN({r_itag,jtag}).

I suspect that some of the "problems" I am facing are simply because I have a small box where the distance between a given "itag" and "jtag" are within the cut-off across multiple boundaries of the simulation box and for a larger system, I would not see this worrying feature. Is that the case?

Once again, thank you for your help.

Best regards,
Yagyik

Thanks very much, Axel, for your reply.

If I understand correctly:

  1. I should request a half neighbourlist (modification in init_style()) for the new implementation because I want to do the computation with two loops and each pair appearing once. For this to work, I should also keep newton_pair=1.

you can use either.
for a full neighbor list all pairs (with local atoms and ghosts) are listed twice (each atom once as “i” and as “j”) and thus results that are summed into arrays must only be added to atom “i”, not atom “j”.
for a half neighbor list all pairs for two local atoms only appear once in the neighbor list, so the resulting contributions must be added to “i” and “j” locations. if one atom list local and the other a ghost then it depends on the newton pair setting. the default (newton pair on) has those also only listed once and thus the results must be summed for ghosts as well, but then you need a reverse communication to add the results from the ghosts to their canonical local atom location. this for newton_pair off you only need to store the result for the “i” atom of such pairs and for newton pair on also to atom “j”. you can see this in the in pair style eam how the “rho” property is collected.
if newton pair is off, no reverse communication is needed.

  1. Fix the update to the “j” atom index conditional on newton_pair=1 or j < nlocal.

you have the choice. it was probably not the best idea to start with pair_sw.cpp as template, but pair_eam.cpp might have been better.

  1. I intend to calculate the forces on any local atom “i” using the per-atom array values of “i” and neighbouring “j” including ghost atoms. For this to work correctly, the ghost atoms need to have their per-atom arrays updated which requires a foward_comm. If the local atoms on the current sub-domain are ghosts on other domains, I do not want to collect those contributions because I know that all the relevant contributions for an atom “i” are calculated in the sub-domain where it is local. Therefore, no reverse_comm, irrespective of whether newton_pair is off or on.

this statement is very confusing. i am looking at the code you provided. in the first loop you are summing the squared distances or properties derived from them into indices “i” and “j” of your custom per atom arrays. for that to be consistent you must use a reverse communication for a half neighbor list with newton pair on (as is what happens in pair style sw which requests a full neighbor list, but then skips over pairs for the 2-body terms to make it a half neighbor list).

  1. I can improve efficiency by allocating the memory for the per-atom arrays once to a large value, zero them out upto the relevant index (nall/nlocal) at each call to compute() and pack the buffer up to the relevant index in pack/unpack_forward_comm.

just look at the rho array in the pair style eam code. I would just zero it with memset() (most efficient) up to nmax (just one line of code).

As to your point about ghost atoms being copies,

one more comment: when comparing to your self-written code, please take into account that for LAMMPS you (usually) don’t have to worry about minimum image conventions. pairwise cutoffs may be larger than half a box for as long as LAMMPS can map the local atoms to ghost atoms. since the ghost atoms are > copies there is no problem with having the same atom show up as periodic replica multiple times.

again, if I understand correctly

  1. The pair distance computed in the pair styles (that I have seen) do not use the minimum image convention because the ghost atoms are spawned to serve that purpose across box boundaries and across subdomains.
  2. This I’m unsure of, but it seems you’re telling me that even if the same pair interaction is computed multiple times for a given local atom and ghosts sharing the same jtag, the force that i calculate will not be wrong because the ghost atoms are copies. However, the total force on the local particle “i” would be different from the case where I treat a given pair interaction only once with rij = MIN({r_itag,jtag}).

one atom will not interact with the same atom multiple times, but can with multiple periodic copies at different distances up to the cutoff.
when you are implementing an MD code that requires minimum image conventions (i.e. when you store each atom only once and have no copies). that LAMMPS is not subject to minimum image conventions comes naturally from the domain decomposition where you need to keep copies atoms from neighboring subdomains. however, if you have only one subdomain, those will just be multiple copies of the same local atoms in your system.

I suspect that some of the “problems” I am facing are simply because I have a small box where the distance between a given “itag” and “jtag” are within the cut-off across multiple boundaries of the simulation box and for a larger system, I would not see this worrying feature. Is that the case?

it means that a code that is based on minimum image conventions cannot produce a correct result in that case.

axel.

Thanks again, Axel.

I'm happy to say that I have managed to reproduce trajectories generated using the existing pair_sw.cpp with the new double loop implementation. For any following this thread or who may encounter the need to specify per-atom arrays in their pair style, I have put the snippet of the working code (with details of the potential abstracted out) at the end of this email. Just to be clear, this is only to reinforce existing discussions that have already covered this and the existing pair styles such as pair_eam.cpp which are useful guides, in case the perspective is useful.

I happen to see a 3x speed up (512 atoms, single processor, 10000 MD steps) in the pair computation time with this implementation despite the inefficiencies in my code - the ineffective memory management which I plan to improve and the use of full neighborlists that are halved/skipped over inside the loop (done to help me debug). I want to prepare this for inclusion in future LAMMPS distributions in case it is of use to anyone. I'd be grateful for any pointers on how to do a proper benchmark and how to prepare it to meet your standards. I am going through the guidelines listed here for now [ 8.6.2. LAMMPS GitHub tutorial — LAMMPS documentation | 8.6.2. LAMMPS GitHub tutorial — LAMMPS documentation ] .

one atom will not interact with the same atom multiple times, but can with multiple periodic copies at different distances up to the cutoff.

BQ_BEGIN

when you are implementing an MD code that requires minimum image conventions (i.e. when you store each atom only once and have no copies). that LAMMPS is not subject to minimum image conventions comes naturally from the domain decomposition where you need to keep copies atoms from neighboring subdomains. however, if you have only one subdomain, those will just be multiple copies of the same local atoms in your system.
it means that a code that is based on minimum image conventions cannot produce a correct result in that case.

BQ_END

I understand this. The comparison with the self-written code is therefore not the correct comparison to make (because minimum image convention is not the correct thing to do) for this case.
A comparison with the energy and particle forces produced by pair_sw.cpp for the same configuration should be the correct comparison to make.

An important part for me was to be able to compare results when I
a) request a half neighborlist, and do "j" updates conditional on newton_pair=1 or j < nlocal . This requires a reverse_comm_pair followed by a forward_comm_pair to sync with the subdomains
b) request a full neighborlist, skip half inside the "j" loop and do "j" updates conditional on newton_pair=1 or j < nlocal. This also requires a reverse_comm_pair followed by a forward_comm_pair.

Thanks Axel for your explanation on this.

Best regards,
Yagyik

void PairSWDS::compute(int eflag, int vflag)
{
int i,j,k,ii,jj,kk,inum,jnum,jnumm1;
int itype,jtype,ktype,ijparam,ikparam,ijkparam;
tagint itag,jtag;
double xtmp,ytmp,ztmp,delx,dely,delz,evdwl,fpair;
double rsq,rsq1,rsq2,rp,rq;
double delr1[3],delr2[3],fj[3],fk[3];
int *ilist,*jlist,*numneigh,**firstneigh;

evdwl = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
//printf("computing DS SW\n");
double **x = atom->x;
double **f = atom->f;
tagint *tag = atom->tag;
int *type = atom->type;
nlocal = atom->nlocal;
nall = nlocal + atom->nghost;
newton_pair = force->newton_pair;
printf("newton pair is %d\n",newton_pair);

inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
memory->destroy(hi);
memory->destroy(si);
memory->create(hi,nall); // oversized if newton_pair=0
memory->create(si,nall);
int looplimit;

if (newton_pair) looplimit=nall;
else looplimit = nlocal;

for(i=0;i<looplimit;i++){
hi[i]=0.0;
si[i]=0.0;
}

for (ii = 0; ii < inum; ii++) {
i = ilist[ii];

itag = tag[i];
itype = map[type[i]];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
jlist = firstneigh[i];
jnum = numneigh[i];

for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
j &= NEIGHMASK;
jtag = tag[j];

// Skip half entries of full neighborlist with lines below
//int irequest = neighbor->request(this);
//neighbor->requests[irequest]->half = 0;
//neighbor->requests[irequest]->full = 1;
if (itag > jtag) {
if ((itag+jtag) 2 == 0\) continue; \} else if \(itag &lt; jtag\) \{ if \(\(itag\+jtag\) 2 == 1) continue;
} else {
if (x[j][2] < ztmp) continue;
if (x[j][2] == ztmp && x[j][1] < ytmp) continue;
if (x[j][2] == ztmp && x[j][1] == ytmp && x[j][0] < xtmp) continue;

}

jtype = map[type[j]];

delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;

ijparam = elem2param[itype][jtype][jtype];
if (rsq > params[ijparam].cutsq ) continue; // apt cut-off for given pair of atoms, their coeffs
rij=sqrt(rsq);

hi[i] += exp(rsq);
si[i] += 2*rsq;

if(newton_pair || j < nlocal) {
hi[j] += exp(rsq);
si[j] += 2*rsq;
}
}

}
if (newton_pair) comm->reverse_comm_pair(this);
comm->forward_comm_pair(this);

for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
//printf("hi i, i %f %d\n",hi[i],i);
itag = tag[i];
itype = map[type[i]];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
jlist = firstneigh[i];
jnum = numneigh[i];

for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
j &= NEIGHMASK;
jtag = tag[j];

// Skip half entries of full neighborlist with lines below
// check init_style() for
//int irequest = neighbor->request(this);
//neighbor->requests[irequest]->half = 0;
//neighbor->requests[irequest]->full = 1;
if (itag > jtag) {
if ((itag+jtag) 2 == 0\) continue; \} else if \(itag &lt; jtag\) \{ if \(\(itag\+jtag\) 2 == 1) continue;
} else {
if (x[j][2] < ztmp) continue;
if (x[j][2] == ztmp && x[j][1] < ytmp) continue;
if (x[j][2] == ztmp && x[j][1] == ytmp && x[j][0] < xtmp) continue;

}

jtype = map[type[j]];

delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;

ijparam = elem2param[itype][jtype][jtype];
if (rsq > params[ijparam].cutsq ) continue; // apt cut-off for given pair of atoms, their coeffs

evdwl = 1/rsq + hi[i] - hi[j] + si[i] + si[j];
fijx = (hi[i] - hi[j] + si[i] + si[j])*delx/sqrt(rsq);
fijy = (hi[i] - hi[j] + si[i] + si[j])*dely/sqrt(rsq);
fijz = (hi[i] - hi[j] + si[i] + si[j])*delz/sqrt(rsq);
f[i][0] += fijx;
f[i][1] += fijy;
f[i][2] += fijz;
if( newton_pair || j < nlocal){
f[j][0] -= fijx;
f[j][1] -= fijy;
f[j][2] -= fijz;
}
if (evflag) ev_tally_xyz(i,j,nlocal,newton_pair,evdwl,0.0,
fijx,fijy,fijz,delx,dely,delz);
}
}
}

int PairSWDS::pack_forward_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;

m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = hi[j];
buf[m++] = si[j];
}
return m;
}

void PairSWDS::unpack_forward_comm(int n, int first, double *buf)
{
int i,ii,m,last;

m = 0;
last = first + n;
for (i = first; i < last; i++)
{
hi[i] = buf[m++];
si[i] = buf[m++];
}
}

int PairSWDS::pack_reverse_comm(int n, int first, double *buf)
{
int ii,i,m,last;
int count=0;
m = 0; // buffer index counter - buffer entries not indexed with i directly
last = first + n;
for (i = first; i < last; i++) // loops over indices split to the different procs
{
buf[m++] = hi[i];
buf[m++] = si[i];
}

return m; // size of the full buffer that is communicated
}

/* ---------------------------------------------------------------------- */

void PairSWDS::unpack_reverse_comm(int n, int *list, double *buf)
{
int ii,i,j,m;

m = 0; // final value is size of communicated buffer - entries in buffer are not indexed with i
printf("unpacking %d\n",n);
for (ii = 0; ii < n; ii++) // loops over indices provided from the other processors
{
j = list[ii];
//printf("%d %d %d %f %f\n",i,j,m,buf[m],hi[j]);
hi[j] += buf[m++];
si[j] += buf[m++];
}
}

[…]

listed here for now https://lammps.sandia.gov/doc/Howto_github.html .

this is more a tutorial for the “mechanics” of submitting a contribution via github. the detailed information in currently in multiple places.
we are currently in the process or updating the guidelines and trying to integrate everything into a single document. you can find the work-in-progress and the links to the corresponding individual documents (not everything is about contributing code) here: https://github.com/lammps/lammps/issues/2643

i don’t spend much time on benchmarking, so I don’t want to give any advice there. this is more the domain of Steve Plimpton or Stan Moore.
I worry more about maintainability and functionality and similar issues.

Axel.