fix deform with quartic bonds -- problem running in parallel

Dear Lammps Users,

I want to perform tensile deformation of a polymer network having two types of bond (quartic and harmonic), but experiencing problem while running the simulation in parallel.

Below is the snippet of the script (small test run) that applies a uniaxial strain in the z direction with x and y are controlled to have zero pressure (total beads 3800):

units lj
atom_style bond
pair_style lj/cut 2.5
bond_style hybrid harmonic quartic
special_bonds lj 1 1 1
neighbor 0.4 bin
neigh_modify every 1 delay 0 check yes
velocity all create 1 54654 dist gaussian mom yes
timestep 0.001
fix fxext all deform 1000 z erate 1 remap x units box
fix fxnpt all npt temp 1 1 0.1 x 0 0 1 y 0 0 1
run 5000

(run using the following command: mpirun --mca btl vader,self -np 4 lmp_mpi < in.deform)

As prescribed by the above commands, strain is applied after every 1000 steps with remap x (affine deformation). At step 1000, the given strain rate (erate 1) doubles the box length in z. I monitor the maximum bond lengths (compute bond/local dist, compute reduce max) and also dump all the bonds. As expected, just after the first deformation, at step 1001, a few bonds bonds break.

The above simulation runs perfectly fine on a single processor, but in parallel, it terminates at step 1011 (10 steps after bonds break) with the Segmentation fault (Signal code: Address not mapped (1)). As suggested by Axel on some previous posts, here is the trace of the error obtained using gdb (from one of the core.# output files):

#0 0x0000000000936926 in LAMMPS_NS::BondHybrid::compute(int, int) () at …/bond_hybrid.cpp:80
#1 0x000000000215bd7a in LAMMPS_NS::Verlet::run(int) () at …/verlet.cpp:315
#2 0x000000000210b395 in LAMMPS_NS::Run::command(int, char**) () at …/run.cpp:183
#3 0x0000000000f26786 in void LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run>(LAMMPS_NS::LAMMPS*, int, char**) () at …/input.cpp:863
#4 0x0000000000f24bc1 in LAMMPS_NS::Input::execute_command() () at …/input.cpp:846
#5 0x0000000000f256e7 in LAMMPS_NS::Input::file() () at …/input.cpp:243
#6 0x0000000000f43796 in main () at …/main.cpp:64

I tried changing neighbor list attributes; increasing the cutoff increases the number of steps it runs, but even for a really high cutoff it doesn’t run more than 1500 steps (cutoff = 0.0 terminates at step 1001 immediately after bond breaking).

I compared the output of every step until 1010 (or the maximum steps in parallel) with the serial run:

Serial Parallel

step Lz maxb max_bondenergy step Lz maxb max_bondenergy

999 16 1.082 10.83 999 16 1.082 10.83

1000 32 2.072 1030 1000 32 8.580 7201870

1001 32 1.499 74.33 1001 32 1.499 74.33

At step 1000 (first deformation step), serial run shows maximum bond length doubles while the parallel run shows a significantly high bond length. But just after step 1000, both run gives the same output (step 1001 onwards). Naively, it appears a problem of recreating bondlist, neighborlist after bonds break in the domain decomposition setup (probably that is giving error in bondhybrid compute function noted in trace). I tried many different things by reading previous posts on quartic bonds but still couldn’t identify the problem. I would truly appreciate if someone can help me with this error.

Thanks for your time in advance!

Akash

Dear Lammps Users,

I want to perform tensile deformation of a polymer network having two types of bond (quartic and harmonic), but experiencing problem while running the simulation in parallel.

Below is the snippet of the script (small test run) that applies a uniaxial strain in the z direction with x and y are controlled to have zero pressure (total beads 3800):

units lj
atom_style bond
pair_style lj/cut 2.5
bond_style hybrid harmonic quartic
special_bonds lj 1 1 1
neighbor 0.4 bin
neigh_modify every 1 delay 0 check yes
velocity all create 1 54654 dist gaussian mom yes
timestep 0.001
fix fxext all deform 1000 z erate 1 remap x units box
fix fxnpt all npt temp 1 1 0.1 x 0 0 1 y 0 0 1
run 5000

(run using the following command: mpirun --mca btl vader,self -np 4 lmp_mpi < in.deform)

As prescribed by the above commands, strain is applied after every 1000 steps with remap x (affine deformation). At step 1000, the given strain rate (erate 1) doubles the box length in z. I

this is a massive disruption. it doesn’t make much sense to do something like that (what is the physical justification), but it will require unusual settings to avoid all kinds of problems. for example, you will have to set a very large communication cutoff to not lose atoms/bonds during the expansion. and overall, i would not expect LAMMPS to be able to act sanely under such insanely extreme conditions. try something (much) less extreme.

axel.

Dear Axel,

Thank you very much for your reply!

I totally understand that this is an extremely high strain rate, and consequently, just at first strain step, approx. 40% of the bonds break. My intention was to see how lammps take care of the bond breaking, and so I applied this high strain in a short run. Very slow deformation doesn’t help (I have also tried increasing the cutoff by comm_modify but no luck).

I took a step back, and now tried to simulate the system without any deformation, i.e., only equilibration with Langevin thermostat and nve time integration.

  1. System with all harmonic bonds works well.
  2. System with all quartic bonds works well.
  3. System with combination of quartic + harmonic gives the segmentation fault in parallel:
    *** Process received signal ***
    Signal: Segmentation fault (11)
    Signal code: Address not mapped (1)
    Failing at address: 0x56376d4b2e60
    [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f821c8db890]
    [ 1] home/lammps-stable/src/lmp_ubuntu_simple(_ZN9LAMMPS_NS10BondHybrid7computeEii+0xc6)[0x5636b2f4a0d6]
    [ 2] /home//lammps-stable/src/lmp_ubuntu_simple(_ZN9LAMMPS_NS6Verlet3runEi+0x23a)[0x5636b379ef6a]
    [ 3] /home//lammps-stable/src/lmp_ubuntu_simple(_ZN9LAMMPS_NS3Run7commandEiPPc+0x891)[0x5636b3398131]
    [ 4] /home//lammps-stable/src/lmp_ubuntu_simple(_ZN9LAMMPS_NS5Input15command_creatorINS_3RunEEEvPNS_6LAMMPSEiPPc+0x3e)[0x5636b35e572e]
    [ 5] /home//lammps-stable/src/lmp_ubuntu_simple(_ZN9LAMMPS_NS5Input15execute_commandEv+0xa01)[0x5636b35e3821]
    [ 6] /home//lammps-stable/src/lmp_ubuntu_simple(_ZN9LAMMPS_NS5Input4fileEv+0x28a)[0x5636b35e42fa]
    [ 7] /home//lammps-stable/src/lmp_ubuntu_simple(main+0x48)[0x5636b2a95788]
    [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f821b688b97]
    [ 9] /home/lammps-stable/src/lmp_ubuntu_simple(_start+0x2a)[0x5636b2a9871a]
    *** End of error message ***

Dear Axel,

Thank you very much for your reply!

[…]

I tried FENE instead of harmonic, but any bond combination with quartic gives the above error. I also tried varying force-field parameters but no luck.

Another thing that I noticed with these tests that with the same input files, the segmentation fault occurs randomly. Sometime simulation stops after ~10 steps, sometime ~500, and sometime it goes through even 100K steps. Randomness also happens with the number of processors: 2, 4, 8 processors runs different amount of steps before giving segfault. Because of all these, I then looked other causes for the segmentation fault with openmpi (both on Lammps mailing list and elsewhere) and learned about hwloc to allocate processors during domain decomposition. I have openmpi 3.1.2, which should have internal hwloc, so I guess that should not be a problem. Here are my machine details:

hwloc has definitely nothing to do with it.

Ubuntu 18.04
Intel Xeon W Processors
Openmpi 3.1.2, gcc 7.3.0

lammps compiled as ubuntu_simple

what is the version of LAMMPS you are using?

I have also tried running on different machines but still get the segmentation fault. Would you recommend me to try kokkos with hwloc functionality? I will also keep checking my initial condition (equilibrating more), but can this segfault occur due to initial condition? (although the same initial condition with lmp_serial runs fine). Could you please give me some suggestions to try based on the above information. Also, please let me know if you need any additional information.

no. you are not looking at this scientifically. as you stated before, the constant in your failures is using a quartic bond style as a substyle of a hybrid bond. this suggests, that bond style hybrid knows nothing about the special way, how “broken” bonds are treated in bond style quartic and thus gets confused when building lists of bonds for substyles. however, it is difficult to check this without the means to reproduce it. your stack traces are not very helpful, either, since your executables were compiled without the -g flag and thus do not provide line numbers (which also require to know the exact version of LAMMPS to be of use).

axel.

Dear Axel,

Thanks again for your reply!

I am using the latest version, 22Aug18. I have tried the previous version (March) as well but no luck. I think I have compiled lammps with -g flag, but I will again check this.

I got digressed towards computational aspect (hwloc) because the errors appears very randomly: it sometimes runs (100K steps) after breaking bonds, but sometimes stop just after bonds break only, and that too varies with number of processors I use.

I have attached my input files, and here are the things I performed just now (running 500K steps). I ran the same input files (same seeds and everything) 5 times:

with np 2:

  1. 3 times it stopped at step 56 (just after 2 bonds break).

  2. 1 time at stopped 198K step.

  3. 1 time, it actually ran full 500K steps!

with np 4:

  1. 2 times stopped at step 56 (just after bond breaking).

  2. 1 time stopped at step 206.

  3. 1 time stopped at 106K step.

  4. 1 time, it ran full 500K steps.

Because of all this, I really don’t know whats happening and makes me think that it might be a problem of my machine. I would truly appreciate if you can look at my input file and let me know if I am doing any trivial mistake.

Thanks,
Akash,

Postdoc,

ChemE, MIT

polymers.txt (652 KB)

in.langevin (1.88 KB)

Dear Axel,

Thanks again for your reply!

I am using the latest version, 22Aug18. I have tried the previous version (March) as well but no luck. I think I have compiled lammps with -g flag, but I will again check this.

the latest version of LAMMPS is 9Nov18. what you have is the latest stable version of LAMMPS.
but it doesn’t make a difference. thanks to the input example you provided, i could confirm, that my suspicion about the cause of the issue was correct and i have a correction for it included below.

I got digressed towards computational aspect (hwloc) because the errors appears very randomly: it sometimes runs (100K steps) after breaking bonds, but sometimes stop just after bonds break only, and that too varies with number of processors I use.

I have attached my input files, and here are the things I performed just now (running 500K steps). I ran the same input files (same seeds and everything) 5 times:

yes. due to bond style quartic doing something, that LAMMPS will not normally allow (assigning a bond type 0), the code in bond style hybrid uses uninitialized data to look up entries in an array. this data can be zero or some crazy random number. if it is zero, the simulation keeps going, but produces incorrect or correct results depending on the order in which you define your hybrid sub-styles. if it is large, you will get segmentation faults.

the patch/diff pasted below, is detecting when you use a quartic bond style as hybrid style and will then initialize the otherwise uninitialized data for bond type 0 to the index of the quartic bond style.

we will add this change to the next LAMMPS patch release. please keep us posted, if this cures the problem for you.

axel.

Dear Axel,

Thanks a lot for providing the patch and explanation! I now understand the source of randomness.

This modification works! I tried it with deformation also, and it works well (at very large step, it shows bond atoms missing, but that is probably because of initial condition). I also tried it with the extreme deformation rate (posted in my first email in this thread), and that also works well for a short time (but yes with a very high comm_modify cutoff).

(just to be precise: the following statement appears twice in the above patch: if (strcmp(arg[i],“none”) == 0) …cannot have none as an argument"); )

Thank you very much for all your help! Also, thanks to the whole team of LAMMPS for creating this awesome simulator!

Regards,
Akash

Dear Axel,

Thanks a lot for providing the patch and explanation! I now understand the source of randomness.

This modification works! I tried it with deformation also, and it works well (at very large step, it shows bond atoms missing, but that is probably because of initial condition). I also tried it with the extreme deformation rate (posted in my first email in this thread), and that also works well for a short time (but yes with a very high comm_modify cutoff).

(just to be precise: the following statement appears twice in the above patch: if (strcmp(arg[i],“none”) == 0) …cannot have none as an argument"); )

thanks for the feedback and notifying me of the (harmless) cut-n-paste goof. i’ll fix it in the pull request to be included in the next LAMMPS version.

axel.