Simulation prohibitively slows down

akohlmey · November 18, 2013, 9:43pm

please keep the mailing list in the loop, so that the conversation is
not lost. we're chasing an elusive, hard to reproduce problem and that
is worth archiving. thanks.

I appreciate your as always quick response.
That's really weird, I'm running it on a node of a cluster and I've tried
different nodes.
Using 16 mpi tasks (all processors available on the node):
The slowdown happens at time step = 2700. Before slowdown each single
iteration is done in less than tenth of a second but right after time step =
2700, each single iteration takes almost 30 seconds! to run.
But, I changed the number of mpi tasks to 4 and also 8 and I'm not seeing
the slowdown anymore! at least it didn't happen up to time step = 15000. Do
you know what can spur this behavior?

ok. the 16 MPI tasks seem to be a requirement as well. i've been able
to trigger the behavior you observe by running with 16 MPI tasks on my
lowly laptop. the fix addforce line seems to accelerate it for some
strange reason. i can also see that the culprit is not fix rigid or at
least not immediately, but rather that on one of the processors the
"multiplicity" array of the charmm dihedral style gets corrupted and
now it loops forever accumulating nonsense dihedral forces.
tracking this down will take a little while, since running under
valgrind is slooooooow. and running 16 of them on two cores even more
so.

axel.

akohlmey · November 18, 2013, 10:51pm

ok. looks like i could track down the issue.

the condition to trigger the problem is that you have deleted dihedral
migrating from to an MPI rank that didn't have any deleted dihedrals
previously. LAMMPS will then construct the list of dihedrals
differently and and thus try to compute the dihedral interaction even
though it was disabled. because the disabling itself is done via
negating the type, looking up the dihedral parameters of such a
deleted dihedral will result in looking up bogus parameters, e.g. a
very large multiplicity, which ultimately will slow down the dihedral
calculation so much.

it likely was a dihedral in the c60, so the observation it was indeed
related to fix rigid, but not directly but rather indirectly.

please replace your neighbor.cpp file with the attached version,
recompile, test again and let us know.

axel.

neighbor.cpp.gz (13.8 KB)

Kasra · November 19, 2013, 2:02am

Sweet, it runs like a charm now …Thank you so much Axel, you’re awesome.
A couple of observations:

Based on what you described, I used ‘remove’ keyword of the delete_bonds command to permanently remove all the bonds from the list and it also worked with the old neighbor.cpp and didn’t show the slowdown.
I also tried to run the simulation with the lammps-icms with commit date of 2013-11-14 and that was working flawlessly too. The neighbor.cpp of that one is almost the same as the this new one that you’ve sent except for:

int on_or_off = bond_off;
MPI_Allreduce(&on_or_off,&bond_off,1,MPI_INT,MPI_MAX,world);
on_or_off = angle_off;
MPI_Allreduce(&on_or_off,&angle_off,1,MPI_INT,MPI_MAX,world);
on_or_off = dihedral_off;
MPI_Allreduce(&on_or_off,&dihedral_off,1,MPI_INT,MPI_MAX,world);
on_or_off = improper_off;
MPI_Allreduce(&on_or_off,&improper_off,1,MPI_INT,MPI_MAX,world);

Thank you again.

Cheers,
Kasra.

akohlmey · November 19, 2013, 6:09am

Sweet, it runs like a charm now ...Thank you so much Axel, you're
awesome.
A couple of observations:
1. Based on what you described, I used 'remove' keyword of the delete_bonds
command to permanently remove all the bonds from the list and it also worked
with the old neighbor.cpp and didn't show the slowdown.

yup.

2. I also tried to run the simulation with the lammps-icms with commit date
of 2013-11-14 and that was working flawlessly too. The neighbor.cpp of that
one is almost the same as the this new one that you've sent except for:

nope. that must be by accident. since the slowdown is due to reading
some random numbers from the stack. so with a modified code base, you
will get some different numbers, and if you get zeros by chance, then
it won't slow down.

int on_or_off = bond_off;
MPI_Allreduce(&on_or_off,&bond_off,1,MPI_INT,MPI_MAX,world);
on_or_off = angle_off;
MPI_Allreduce(&on_or_off,&angle_off,1,MPI_INT,MPI_MAX,world);
on_or_off = dihedral_off;
MPI_Allreduce(&on_or_off,&dihedral_off,1,MPI_INT,MPI_MAX,world);
on_or_off = improper_off;
MPI_Allreduce(&on_or_off,&improper_off,1,MPI_INT,MPI_MAX,world);

yes. that is exactly the modification that i made. the "*_off" flags
need to be synchronized across all MPI ranks and they were not.

ciao,
axel.