Issue with multiple force computations per step

Dear community,

I am experimenting with higher-order time integration algorithms (as compared to Velocity Verlet). These algorithms require more than one force evaluation per MD time step. I started with a copy of “verlet.cpp” and implemented my method there. In serial execution, it works completely fine. In parallel, it also works fine with “newton off”. However, with “newton on”, there are deviations larger than the typical round-off errors already after a few steps.

So it seems that the position of the ghost atoms is not correctly distributed to all processors for some of the intra-step force calculations. I always call comm->forward_comm(); before every force calculation. Could it be that the communication subsystem gets confused because the number of the time step is still the same for multiple force evaluations? Any other ideas how I could approach or debug that?

Best regards,
Martin

For newton on you also need to do a reverse communication to retrieve the forces on ghost atoms. Also you need to clear the forces on ghost atoms in addition to the local atoms.

Thanks for the quick reply. As I started with the original verlet.cpp code, there is the conditional reverse communication command in case of newton on, which I kept after every force evaluation. The clearing of the forces is performed via the force_clear() routine from verlet.cpp before every force computation, which seems to take care of clearing also ghost atom forces in case of newton on.

I really copied most code from verlet.cpp, I just call the part which computes the force more than one time.

If you don’t see a fundamental issue with my approach, I will continue with debugging it. I just wanted to make sure that this approach (multiple force computations within the same MD step) could work in principle.

Any further hints are welcome :slight_smile:

Best regards,
Martin

Just in case it is of interest, here is my force calculation routine which I call mutliple times (stripped of the “timer” calls).

void MyIntegrator::calc_force() {

    comm->forward_comm();

    force_clear();

    if (modify->n_pre_force)
      modify->pre_force(vflag);

    if (pair_compute_flag)
      force->pair->compute(eflag,vflag);

    if (atom->molecular != Atom::ATOMIC) {
      if (force->bond) force->bond->compute(eflag,vflag);
      if (force->angle) force->angle->compute(eflag,vflag);
      if (force->dihedral) force->dihedral->compute(eflag,vflag);
      if (force->improper) force->improper->compute(eflag,vflag);
    }

    if (kspace_compute_flag)
      force->kspace->compute(eflag,vflag);

    if (modify->n_pre_reverse)
      modify->pre_reverse(eflag,vflag);

    if (force->newton)
      comm->reverse_comm();

    if (modify->n_post_force_any)
      modify->post_force(vflag);
}

It is not entirely clear what you mean by this, and you might need to say a bit more about what you are seeing in more specifically technical terms to get help. (For one, the particles’ position coordinates in atom->x[i][j] are never changed during a force calculation – only during the initial_integrate and/or final_integrate in the integrator fix, for example fix_nh.cpp.)

Again, you will have to be more specifically technical about what you mean. If you look at the comm code you will see that you just use them either to send information from a “local” atom on its processor to all its “ghosts” (which can even be on the same proc, since LAMMPS also handles periodic images as ghosts), or from all ghosts back to a local atom. Thus, comm just needs to know which ghosts on which procs correspond to a particular local atom – and it does not “know” many other things, such as what timestep it is in the simulation.

Thanks for your reply.

As I wrote, I implemented a higher-order integrator which modifies the atom coordinates multiple times within one MD time step, and therefore also performs multiple force calculations within one MD step (using the routine I posted above as source code). I don’t call initial_integrate and final_integrate. The time propagation of atom positions and velocities is completely handled within my replacement code for Verlet. It works correctly for serial runs, and also for parallel runs with newton off. Just not for parallel runs with newton on.

I would be happy to post more parts of my source code, I just don’t want to spam the thread with unnecessary code blocks. If you would like to have a look, I can upload it all. Or additional specific parts of it, if you have a hypothesis where the issue could arise from.

That should be fine. When the communication works once per step in the original Lammps code, I would assume it would also work multiple times per step, after the atom coordinates were modified intra-step.

One more point is the neighbor list formalism. My integrator calls the neighbor list update code (checking, and possibliy re-neighboring, just as in verlet.cpp) only once at the beginning of the time step, not after each intra-step atom position update. I can ensure that no atom can move further than the skin depth within a MD step, so it should be fine. I have to do it that way, because the integrator has to store temporary values for each local atom in a time step. If some atoms would be re-assigned in the middle of a time step after intra-step position update, all these temporary values would become invalid.

Any further hypotheses what breaks my code in the newton on case would be appreciated.

Best regards,
Martin

For this to work properly requires total consistency across all “ghosts” – that is, any time an atom’s position or velocity “is updated” on its local processor, the same updates must consistently be applied to every one of its ghost atoms.

My guess is, your code is likely failing to achieve this consistency. It is impossible to know (1) whether this guess is correct and (2) what in your code is causing inconsistency unless someone were to sit down for an extended duration with your code to find the bug.

You are certainly welcome to upload your source code in whatever format you like, or even point us to a GitHub or similar repository, but it does not guarantee that someone will have time to look at it. :slight_smile:

Thanks. I was hoping that this consistency would be automatically maintained by calling comm->forward_comm() on all MPI ranks after modifying the local atom positions and before entering the force calculation. Could you explain some cases in which despite this communication call the consistency might be broken?

I managed to strip down the whole code to around 200 lines, please find it below. I removed all “non-critical” parts such as timer calls and the integrator itself (it works, see newton off case). Here it is. The class is a drop-in replacement for Verlet (see verlet.cpp).

I know it is bad style to have the integration directly there… Fixes such as nve or nvt would simply be ignored, they are simply not called. It is not meant to be published like this, I just want to find the bug.

void MyIntegrator::calc_neighbors( int ntimestep ) {

  int nflag, sortflag;

  if (atom->sortfreq > 0) sortflag = 1;
  else sortflag = 0;

    nflag = neighbor->decide();

    if (nflag != 0) {
      if (modify->n_pre_exchange)
        modify->pre_exchange();
      if (triclinic) domain->x2lamda(atom->nlocal);
      domain->pbc();
      if (domain->box_change) {
        domain->reset_box();
        comm->setup();
        if (neighbor->style) neighbor->setup_bins();
      }
      comm->exchange();
      if (sortflag && ntimestep >= atom->nextsort) atom->sort();
      comm->borders();
      if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
      if (modify->n_pre_neighbor)
        modify->pre_neighbor();
      neighbor->build(1);
      timer->stamp(Timer::NEIGH);
      if (modify->n_post_neighbor)
        modify->post_neighbor();
    }
}


void MyIntegrator::force_clear()
{
  size_t nbytes;

  if (external_force_clear) return;

  int nlocal = atom->nlocal;

  if (neighbor->includegroup == 0) {
    nbytes = sizeof(double) * nlocal;
    if (force->newton) nbytes += sizeof(double) * atom->nghost;

    if (nbytes) {
      memset(&atom->f[0][0],0,3*nbytes);
      if (torqueflag) memset(&atom->torque[0][0],0,3*nbytes);
      if (extraflag) atom->avec->force_clear(0,nbytes);
    }

  } else {
    nbytes = sizeof(double) * atom->nfirst;

    if (nbytes) {
      memset(&atom->f[0][0],0,3*nbytes);
      if (torqueflag) memset(&atom->torque[0][0],0,3*nbytes);
      if (extraflag) atom->avec->force_clear(0,nbytes);
    }

    if (force->newton) {
      nbytes = sizeof(double) * atom->nghost;

      if (nbytes) {
        memset(&atom->f[nlocal][0],0,3*nbytes);
        if (torqueflag) memset(&atom->torque[nlocal][0],0,3*nbytes);
        if (extraflag) atom->avec->force_clear(nlocal,nbytes);
      }
    }
  }
}


void MyIntegrator::calc_force( int ntimestep ) {

    comm->forward_comm();

    force_clear();

    if (modify->n_pre_force)
      modify->pre_force(vflag);

    if (pair_compute_flag)
      force->pair->compute(eflag,vflag);

    if (atom->molecular != Atom::ATOMIC) {
      if (force->bond) force->bond->compute(eflag,vflag);
      if (force->angle) force->angle->compute(eflag,vflag);
      if (force->dihedral) force->dihedral->compute(eflag,vflag);
      if (force->improper) force->improper->compute(eflag,vflag);
    }

    if (kspace_compute_flag)
      force->kspace->compute(eflag,vflag);

    if (modify->n_pre_reverse)
      modify->pre_reverse(eflag,vflag);

    if (force->newton)
      comm->reverse_comm();

    if (modify->n_post_force_any)
	    modify->post_force(vflag);
}


void MyIntegrator::run(int n) {

	bigint ntimestep;
	int nflag;
	int igroup, groupbit;
	
	igroup = group->find( "all" );
	if (igroup == -1)
		error->all(FLERR, "Could not find group \"all\"");
	groupbit = group->bitmask[igroup];

	double h = update->dt;

	std::vector<double> qorig, fqtemp;

	// Loop over MD steps
	for (int i = 0; i < n; i++) {

		if (timer->check_timeout(i)) {
		  update->nsteps = i;
		  break;
		}

		ntimestep = ++update->ntimestep;

		ev_set(ntimestep);
		
		// Check neighbor lists once in the beginning of each MD step
		calc_neighbors( ntimestep );
		
		double **x = atom->x;
		double **v = atom->v;
		double **f = atom->f;
		double *rmass = atom->rmass;
		double *mass = atom->mass;
		int *type = atom->type;
		int *mask = atom->mask;
		int nlocal = atom->nlocal;
		if (igroup == atom->firstgroup)
			nlocal = atom->nfirst;

		if (qorig.size() < 3*nlocal) {
			qorig.resize( 3*nlocal );
			fqtemp.resize( 3*stages*nlocal );
		}

        // Store the initial atom positions
		for (z=0;z<nlocal;z++) {
			if (mask[z] & groupbit == 0)
		       		continue;
			qorig[z*3  ] = x[z][0];
			qorig[z*3+1] = x[z][1];
			qorig[z*3+2] = x[z][2];
		}


        // Go throgh the stages of the integrator.
        // Every stage involves one position update and subsequent force calculation.
		for (zs=0;zs<stages;zs++) {

            // Set positions to some new values (the actual integrator, omitted here)
			for (z2=0;z2<nlocal;z2++) {
				if (mask[z2] & groupbit == 0)
			      			continue;
				x[z2][0] = /* Some Value */;
				x[z2][1] = /* Some Value */;
				x[z2][2] = /* Some Value */;
			}

			calc_force( ntimestep );

            // Store accelerations at the new positions
			for (z2=0;z2<nlocal;z2++) {
				if (mask[z2] & groupbit == 0)
		       			continue;
				if (rmass) {
					fqtemp[zs*3*nlocal+z2*3  ] = f[z2][0] * force->ftm2v / rmass[z2];
					fqtemp[zs*3*nlocal+z2*3+1] = f[z2][1] * force->ftm2v / rmass[z2];
					fqtemp[zs*3*nlocal+z2*3+2] = f[z2][2] * force->ftm2v / rmass[z2];
				} else {
					fqtemp[zs*3*nlocal+z2*3  ] = f[z2][0] * force->ftm2v / mass[type[z2]];
					fqtemp[zs*3*nlocal+z2*3+1] = f[z2][1] * force->ftm2v / mass[type[z2]];
					fqtemp[zs*3*nlocal+z2*3+2] = f[z2][2] * force->ftm2v / mass[type[z2]];
				}
			}
		}

        // Update also the velocities at the end
		for (z2=0;z2<nlocal;z2++) {
			if (mask[z2] & groupbit == 0)
		      			continue;
			v[z2][0] += /* Some Value */;
			v[z2][1] += /* Some Value */;
			v[z2][2] += /* Some Value */;
		}

		if (modify->n_end_of_step) modify->end_of_step();

		if (ntimestep == output->next)
		  output->write(ntimestep);
	}
}

I think I have tracked it down a little bit further.

In the original source code from verlet.cpp, the neighbor list check/update is performed between modify->initial_integrate() and the force calculations, i.e. after the atom positions have been changed by the integrator.

When I move the neighbor list check/update to the front before modify->initial_integrate(), I get the same behavior as described above: A serial run still gives correct results, but a parallel run with newton on shows significant deviations.

Can someone explain to me why this happens? My neighbor skin depth is chosen so that certainly no atom can move by that distance within one time step, so it should not matter if I call the neighbor list update before or after the atom positions are modified… In a serial run, this seems to be the case. Why does it suddenly matter in a parallel run?

Best regards,
Martin

Sorry for the repeated posting. I believe I found the issue now.

During a neighbor list rebuild, the local atoms on each MPI rank are possibly reshuffled (or new atoms appear / old atoms leave that domain). Therefore, obviously also the atom position and velocity arrays are reshuffled. For example, the entry atom->x[17] does not necessarily correspond to the same physical atom in the simulation before and after the neighbor list rebuild. So far so good (all as expected).

Could it be that this reshuffling is omitted for the forces, atom->f? In the typical Lammps workflow, forces are never accessed after neighbor list rebuilds as it seems, so maybe the force arrays are just ignored in the reshuffle for performance reasons. If this is true, it immediately explains why my code fails. If I rebuild the neighbor lists in the beginning of each step, and then try to read out the old forces acting on the atoms (from the last MD step), they no longer necessarily correspond to the same atoms…

If my theory is correct: Is there any way to tell Lammps to also reshuffle the atom->f arrays during neighbor list rebuilds, so that the old forces are still valid afterwards? If there is no switch to do so, I would also be fine with a small source code modification if you point me to the correct place.

Thanks and best regards,
Martin

Sorry, but this makes no sense. The force array is cleared and recomputed, so there is no way to access “old” forces unless you make a copy of them.

Perhaps, you need to have a look at the implementation of run style respa for a change, since that one actually makes copies of forces (and torques) so that the copy can be used instead of recomputing it (that is how r-RESPA works).

Now, if you need copies of per-atom properties, you can create an instance of fix STORE/ATOM (internally) and the fix will move the per-atom data with the atoms during atom migration for you.

More generally, the LAMMPS classes that do calculations (fixes, pairs, etc) have mechanisms to customise how Comm packs and unpacks data for their needs. Modify typically delegates those calculations and this has far less customisation (none inside Comm, technically, but it “talks to” commands like comm_modify vel yes).

I hope you are debugging your code somehow (ideally gdb, but even just print-ing out the contents of various vectors after lines that might have bugs. As you can tell, it is quite hard for you to even figure out what to ask, let alone for us to answer.

I find that a useful template in such situations is “After running line X to do process P, my hand calculation says vector V should hold number N, but it’s holding number M instead, please help.”

An exact “why” is very difficult to isolate both remotely and in person. An exact “where” the problem is and “what” size the discrepancy is will be immediately much more informative.

1 Like

Thanks for your replies.

I now opted for a way to edit the pack and unpack routines in the atom_vec.cpp file and include the forces each time the velocities are packed and unpacked. It works very well so far. Results from parallel runs are now fully correct. I know this is not a solution that would make it into any Lammps release, but at least it works locally for me now. I will look into the respa solution, thanks for the hint. It sounds a lot cleaner.

I honestly cannot fully follow that argument. Due to intense debugging (using e.g. the tools you mentioned, which I do since decades), I was able to track down the issue step by step and finally identify and solve it. I agree that my inbetween postings were maybe a little premature, I could have omitted them. But did you find my questions so imprecise? I accept that opinion.

Thanks and best regards,
Martin