unable to parallelize breakable bonds

MURAT_OZTURK · November 16, 2014, 1:19am

Hello lammps people,

I have been working with lammps for the past month or so and this is my first email to the list.

I have implemented some simple potentials to simulate a very coarse dna model. I have implemented a dihedral, a hard-core repulsive pair interaction and a harmonic-cosine breakable bond potential.

My simulations have ran on single processors for many days with no problems and produced consistent results. Now I am attempting to do parallel simulations with no luck. Any help is appreciated. Here is as much info as I believe is necessary.

Preliminary :

I am using 30 Oct stable version + my 3 potentials.
My simulations run fine on single processors for 10+ days.
Using real units
There are 1600 particles in this particular system

Runtime :

mpirun -np 6 lmp_murat -in mysim.in

ERROR on proc 3: Bond atoms 560 1041 missing on proc 3 at step 0 (../neigh_bond.cpp:65)

mpirun -np 4 lmp_murat -in mysim.in

ERROR on proc 0: Bond atoms 301 1300 missing on proc 0 at step 155614 (../neigh_bond.cpp:65)

lmp_murat -in mysim.in > runs fine!

So the error happens in first step for 6 cpus, in some arbitrary step for less cpus. In any case there are more than 100 particles per cpu.

What I have tried and did not work:

Dialing the timestep wayy down : from 75 which works nicely on single processor to 5
frequent neighbor list rebuilds : neigh_modify delay 0 every 1 check no
large neighborlist cutoff : neighbor nsq 50 (from 5 which works on single processor)
alternative decompositon : balance 1.2 rcb with comm_style tiled
3rd law : newton on/off

The catch :

The relevant (missing) atoms are bound by harmonic-cosine bonds which I implemented myself.

My harmonic cosine bond is breakable (it has a cutoff distance beyond which it does not do anything), which means bonded atoms can be arbitrarily distant. This bond is in fact not only breakable but also re-bondable which means I cannot just remove the bonds once they break. This is not a problem in a single processor run, but I am beginning to think that it could be for parallel runs.

I would appreciate any insight into the problem and suggestions to better implement such a breakable bond. I am attaching the source files for my 'hacks'.

Thank you very much

Murat

bond_harcos.cpp (6.68 KB)

dihedral_breakable.cpp (11.2 KB)

pair_hardcore.cpp (9.18 KB)

akohlmey · November 17, 2014, 8:49pm

Hello lammps people,

I have been working with lammps for the past month or so and this is my
first email to the list.

I have implemented some simple potentials to simulate a very coarse dna
model. I have implemented a dihedral, a hard-core repulsive pair interaction
and a harmonic-cosine breakable bond potential.

My simulations have ran on single processors for many days with no problems
and produced consistent results. Now I am attempting to do parallel
simulations with no luck. Any help is appreciated. Here is as much info as I
believe is necessary.

there is a fundamental issue with how you handle bond breakage that
collides with the domain decomposition that LAMMPS uses for MPI
parallelization.
in your code bonds are not really broken, you just turn off the force.
that means that bonds can get very long, and that will move the two
bond atoms so far apart, that they are in different domains. at that
point you get the error that you see. if you run with only one
processor, you don't have multiple domains and thus the error will not
happen. you have to change the way how you en/disable bonds. the
convention in LAMMPS is to change the sign. please check out fix
bond/create and fix bond/break.

axel.

MURAT_OZTURK · November 18, 2014, 9:01am

Thanks for the response.

Before I read your response I found out about comm_modify command. So if I set a very high communication cutoff, this is a step in the right direction to parallelize this system although it is not the right way to do it.

I will modify ‘fix bond/break’ and ‘fix bond/create’ to my needs and combine them with a high communication cutoff to get things to move smoothly (this is a MD run after all). Consider this :

r_cutoff < bond-modify_cutoff < comm_cutoff

In this case after the bond smoothly breaks at r_cutoff, its atoms will still be communicated until a higher comm_cutoff, and then the bond will be properly broken if it goes beyond that (with some buffer distance in between). I think this will work nicely.

But I would like to understand this one thing : consider the worst case scenario where I set a communication cutoff as big as the system, then I have ghost atoms for everything in every cpu, (with newton on), is the extra cost here lie purely in communication?

I am not going to do this, I would only like to get a feel for it.

Thanks

Murat

akohlmey · November 18, 2014, 9:51am

Thanks for the response.

Before I read your response I found out about comm_modify command. So if I set a very high communication cutoff, this is a step in the right direction to parallelize this system although it is not the right way to do it.

I will modify ‘fix bond/break’ and ‘fix bond/create’ to my needs and combine them with a high communication cutoff to get things to move smoothly (this is a MD run after all). Consider this :

r_cutoff < bond-modify_cutoff < comm_cutoff

In this case after the bond smoothly breaks at r_cutoff, its atoms will still be communicated until a higher comm_cutoff, and then the bond will be properly broken if it goes beyond that (with some buffer distance in between). I think this will work nicely.

But I would like to understand this one thing : consider the worst case scenario where I set a communication cutoff as big as the system, then I have ghost atoms for everything in every cpu, (with newton on), is the extra cost here lie purely in communication?

No.

I am not going to do this, I would only like to get a feel for it.

Then you should do.some benchmarks with different cutoffs.