[BUG] CMAP atoms missing after reset_atoms id

alphataubio · December 10, 2024, 6:33am

ResetAtomsID::command() doesnt update atom ids in **crosstermlist of FixCMAP causing the error:

# EXTRACELLULAR CONCENTRATIONS
delete_atoms random count 689 yes k NULL 12345 compress no
Deleted 689 atoms, new total = 740889
delete_atoms random count 35 yes na NULL 12345 compress no
Deleted 35 atoms, new total = 740854
delete_atoms random count 94 yes ca NULL 12345 compress no
Deleted 94 atoms, new total = 740760
delete_atoms random count 88 yes mg NULL 12345 compress no
Deleted 88 atoms, new total = 740672
delete_atoms random count 1088 yes cl NULL 12345 compress no
Deleted 1088 atoms, new total = 739584
reset_atoms id sort yes
Resetting atom IDs ...

[...]

ERROR on proc 0: CMAP atoms 10856 10858 10860 10873 10875 missing on proc 0 at step 0 (src/MOLECULE/fix_cmap.cpp:265)
Last command: minimize 0 0 10000 20000

(1) how would i add a CMAP loop in ResetAtomsID::command() in a way that works whether MOLECULE package is included or not ?

(2) how do i reach **crosstermlist of FixCMAP class from ResetAtomsID ?

This might be related to CMAP atoms missing on proc error.

alphataubio · December 10, 2024, 7:08am

UPDATE…

if i dont reset_atoms then i get:

ERROR: Atom IDs must be consecutive for velocity create loop all (src/velocity.cpp:264)

and with velocity all create 310.15 $((part+1)*12345) dist gaussian loop local instead, i get the classic lost atoms with an insane starting pressure:

   Part        Step          Time          S/CPU         CPULeft         TotEng          Temp          Press          Volume     v_diffusion_coeff
         2           0   0              0              0             -1960016.7      463.05524     -1.2772529e+293  8000000        0            
ERROR on proc 0: Out of range atoms - cannot compute PPPM (src/KSPACE/pppm.cpp:1832)

full input script attached
tg2-charmm.in (4.5 KB)

srtee · December 10, 2024, 11:45am

There are two approaches you can take:

The quick patch

Have reset_atom_ids cycle all fixes and call their respective fix->update_tags_after_reset_ids functions:

Add a placeholder function to the Fix base class: void update_tags_after_reset_ids(*tagint oldIDs, *tagint newIDs) {}.
Now reset_atom_ids can call all fixes that need to update their IDs before cleaning up:

  // NEW -- call fixes to update their IDs

  for (int i = 0; i++; i < modify->nfix) {
    modify->fix[i]->update_tags_after_reset_ids(oldIDs, newIDs);
  }

  // clean up

  memory->destroy(oldIDs);
  memory->destroy(newIDs);

Write FixCMAP’s override that will take the old IDs and new IDs list and update its internal data structures properly.

The correct thing to do

So, if I understand correctly, CMAP is basically … a cross-dihedral term?

That means every CMAP “five-body interaction” is in fact completely specified by (1) a “CMAP carbon” flag which is 1 for any CMAP alpha carbon and 0 otherwise; (2) for each CMAP alpha carbon, three numbers stating its psi-dihedral type, phi-dihedral type, and CMAP type respectively. The “CMAP topology” can be uniquely reconstructed from this specification as long as each alpha carbon has exactly one psi-dihedral and one phi-dihedral (which is easily checked).

This specification is completely local and independent of atom IDs, and also provides a unique proc owner for each CMAP interaction (the proc that owns the alpha carbon owns the CMAP).

So, the path of least resistance is to maintain the CMAP “data reader” but (1) immediately map the atom topology to internal “alpha-psi-phi-type” atom arrays; (2) write a post_neighbour function for reconstructing each CMAP’s local/ghost “atoms1-5” indices from said arrays after reneighbouring; (3) alter write_data to write out tags to data files in “the old style” from the new internal data structures.

In the long term, it would be relatively much more efficient to calculate CMAP as a dihedral, since the only effect of CMAP is to add to each dihedral’s force magnitude a term dependent on both dihedrals (and thus all CMAP force directions are parallel to dihedral force directions, and the geometric and trigonometric calculations can be reused). Using the “alpha-psi-phi-type” data structure, CMAP dihedrals can be evaluated as pairs and the cross-term added to both dihedrals’ forces extremely efficiently. The advantage is that, if some of the dihedral geometry code is put back into the dihedral base class rather than duplicated across dihedrals, you’ll get a much more robust code base and faster evaluations with fewer chances for errors in related terms (for example virials).

alphataubio · December 10, 2024, 11:26pm

thanks @srtee as always you have both the brute force and smart solution !

yes CMAP is the 1-5 term in the charmm forcefield. im not going to rewrite CMAP from scratch with a different geometry approach given that im already starting to drift away from charmm towards REAXFF and PACE.

i did resolve my issue, i was making a beginners mistake. my plan was to start with a charmmgui generated system with extra ions, delete some ions to achieve whatever concentration for a given simulation (easier than adding ions), minimize, save that as a dump. then in the next script, read that dump for multiple replicas (ie. partitions), reinitialize each replica with a different set of velocities as suggested by @giacomo.fiorin to get an ensemble of paths with different initial conditions.

the rather edge case (or corner case as some would say) of this CMAP bug after deleting atoms and resetting ids was just obscuring the inevitable outcome. my idea of reading a pre-minimized dump and changing the velocities of each replica was never going to work because resetting all the velocities without minimizing after created geometries with an insane pressure that quickly blew up (see the classic lost atoms problem…)

the solution is to, in this order:
(1) read charmmgui generated system (in each partition)
(2) create velocities BEFORE #3 (for each partition with different seed)
(3) delete some ions to achieve desired concentrations (for each partition with different seed)
(4) minimize (for each partition)
(5) run (for each partition)

in one continuous script.