CMAP atoms missing on proc error

Dear All,

I am trying to simulate a protein in water using charmm36m and keep getting a “CMAP atoms missing” error. It generally doesn’t happen immediately but after a few 100ps. I have inspected the trajectory just before the error and nothing seems to be blowing up which I understand is generally the origin of these types of errors.

I thought it might be an issue with communication, so I increased the communication distance to 16 angstroms. This does reduce the occurrence of the error, but I don’t 100% understand why as the distances between the atoms never seem to get close to this distance. The occurrence of the error also seems to be reduced by reducing the number of ranks (for reference I am simulating 10000 atoms on 16 ranks). Again this is likely my ignorance but I don’t understand why this will help reduce the error if the distance between the atoms doesn’t get close to the communication distance.

Some additional insight on how best to debug would be very useful.

Thanks,
Jamie

The first thing to check would be that the list of atoms in the CMAP section of the data file is correct.
There should be 7 numbers per line, the first is a counter, the second the CMAP type followed by 5 atom IDs that would represent 5 atoms involved in the CMAP cross term. Those 5 atoms need to be the connected to each other with bonds and represent 3 connected amino acids.

My guess would be that some how this chain was interrupted and thus the atoms in the CMAP crossterm are slowly diffusing away from each other. For fix cmap to work, all 5 atoms must be accessible on the subdomain either as local atoms to ghost atoms. Even if all 4 connected bonds are stretched out linearly it would only take a communication cutoff of about 6 angstrom to have all atoms present in the same subdomain.

Thanks for the advice. I’ve gone through the data file and there doesn’t seem to be any problems with the CMAP section. I set the dump frequency to 1 and inspected the trajectory. The atoms involved in the error don’t seem to diffuse away from each other and are only around 5 angstroms away from each other when the error appears.

There are four things you should do now.

  1. we need to know exactly what LAMMPS version you are using with which compiler settings and which options etc. the works. check out the forum guidelines for details.
  2. you should remove the comm_modify command to increase the communication cutoff. The default cutoff should be more than good enough.
  3. see what happens if you write a restart (and data) file before the time a crash typically happen and the try to restart from either and see if the crash will happen around the same timestep as before or (much) later.
  4. try to set up a smaller test system that can be run faster and also check whether this will make the error happen sooner.