delete_bonds peculiar behavior

Kasra · January 21, 2016, 3:03am

Hi All,

I’m seeing a inconsistent behavior when I use “delete_bonds”. I posed a question regarding this issue while I was working on load balancing of a simulation which is available in:
The problem is the same one that I described in above thread, i.e. a big colloidal solute (10nm in diameter) is in water which is covering both sides of a solid polymeric wall (see attached figure). In order to keep the wall in place, I define a block within the wall (with a width of 5anstrom and its xy plane extent are the same as the wall itself) and I exclude this block from integration and I also delete the bonds within this block to eliminate their potential contributions to the simulation. However, deleting bonds is where I see some peculiar behavior which is not identically reproducible each time that I run the system! If there is no colloidal particle in the system and I’m not setting neighbor lists and cut offs to satisfy the colloidal interaction requirements, delete_bonds works just normal. However, in the presence of the colloidal particle each time that I run the simulation I see following the behaviors sporadically: – Sometimes it smoothly pass the “delete_bonds” and the simulation goes as it should – Sometimes it gets stuck forever in “Deleting bonds …” mode as I reported in (). – Sometimes it aborts the job by giving the following errors: == When I was using LAMMPS (11 Sep 2015-ICMS) as I reported in () : : : Deleting bonds … [cli_79]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 79 == Now I’m using LAMMPS (13 Jan 2016-ICMS) and at times that I get errors they’re either of the following ones: ** Deleting bonds … ERROR on proc 139: Failed to reallocate 1310720 bytes for array atom:bond_atom (…/memory.cpp:66) [cli_139]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 139 OR ** Deleting bonds … ERROR on proc 29: Failed to reallocate 1572864 bytes for array atom:nspecial (…/memory.cpp:66) [cli_29]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 29 OR I have also seen errors the same as above but arising from “atom:improper_atom” or “atom:dihedral_atom”. I tried to make working example with a simplified version of my simulation but I was not able to reproduce the error for smaller systems. I have no clue what triggers such a behavior: is it LAMMPS issue? or something with the cluster that I’m running on? or my input file parameter settings? I’m attaching the input file that I use for further clarification of what I’m doing, however, the data file for the system is huge (almost 70 MB) but I can provide it if it’s needed for further investigations. Best, Kasra.

delete_bonds-issue.in (2.92 KB)

akohlmey · January 21, 2016, 3:24am

Hi All,

I'm seeing a inconsistent behavior when I use "delete_bonds". I posed a
question regarding this issue while I was working on load balancing of a
simulation which is available in:
LAMMPS Molecular Dynamics Simulator

The problem is the same one that I described in above thread, i.e. a big
colloidal solute (10nm in diameter) is in water which is covering both sides
of a solid polymeric wall (see attached figure). In order to keep the wall
in place, I define a block within the wall (with a width of 5anstrom and its
xy plane extent are the same as the wall itself) and I exclude this block
from integration and I also delete the bonds within this block to eliminate
their potential contributions to the simulation. However, deleting bonds is
where I see some peculiar behavior which is not identically reproducible
each time that I run the system! If there is no colloidal particle in the
system and I'm not setting neighbor lists and cut offs to satisfy the
colloidal interaction requirements, delete_bonds works just normal. However,
in the presence of the colloidal particle each time that I run the
simulation I see following the behaviors sporadically:

-- Sometimes it smoothly pass the "delete_bonds" and the simulation goes as
it should
-- Sometimes it gets stuck forever in "Deleting bonds ..." mode as I
reported in (LAMMPS Molecular Dynamics Simulator).
-- Sometimes it aborts the job by giving the following errors:

        == When I was using LAMMPS (11 Sep 2015-ICMS) as I reported in
(LAMMPS Molecular Dynamics Simulator) :

                :
                :
                Deleting bonds ...
                ERROR on proc 79: Failed to reallocate 1966080 bytes for
array atom:x (../memory.cpp:66)
                [cli_79]: aborting job:
                application called MPI_Abort(MPI_COMM_WORLD, 1) - process 79

        == Now I'm using LAMMPS (13 Jan 2016-ICMS) and at times that I get
errors they're either of the following ones:

               ** Deleting bonds ...
                    ERROR on proc 139: Failed to reallocate 1310720 bytes
for array atom:bond_atom (../memory.cpp:66)
                    [cli_139]: aborting job:
                    application called MPI_Abort(MPI_COMM_WORLD, 1) -
process 139

             OR

             ** Deleting bonds ...
                   ERROR on proc 29: Failed to reallocate 1572864 bytes for
array atom:nspecial (../memory.cpp:66)
                   [cli_29]: aborting job:
                   application called MPI_Abort(MPI_COMM_WORLD, 1) - process
29

            OR
                  I have also seen errors the same as above but arising from
"atom:improper_atom" or "atom:dihedral_atom".

all of these messages mean, that LAMMPS tries to allocate between 1GB
and 2GB additional memory and fails.
your job is running out of RAM (or rather address space).

I tried to make working example with a simplified version of my simulation
but I was not able to reproduce the error for smaller systems. I have no
clue what triggers such a behavior: is it LAMMPS issue? or something with

no surprise. smaller systems require less RAM.

the cluster that I'm running on? or my input file parameter settings? I'm
attaching the input file that I use for further clarification of what I'm
doing, however, the data file for the system is huge (almost 70 MB) but I
can provide it if it's needed for further investigations.

what you *could* do is to run a smaller test system with valgrind and
see, if there are any (relevant) memory leaks.

axel.

sjplimp · January 21, 2016, 3:31pm

I’m guessing what is different when you have vs do not have

the colloid particle is your cutoffs. You have a max cutoff of 65 Angs

with the colloid, which means huge numbers of ghost atoms must

be stored, huge per-atom arrays, etc. That is likely why you

are running out of memory. All of this should be fine if you

use more procs or run a smaller problem.

I imagine delete_bonds is croaking not b/c of anything delete_bonds

itself does, but b/c it first acquires ghost atoms, just like a run

command would do. To delete bonds you only need a cutoff

long enough to encompass the longest bond. So you could

define a simple lj/cut potential for everything with a short cutoff

before you delete bonds, delete the bonds, then re-define the

pair style and cutoffs for your real model.

I am guessing you will then have the same problem as the 1st
paragraph above with

memory, etc when you perform a run command with the huge

cutoffs.

I’ll also note that I don’t see any reason to delete a few bonds

inside a frozen block of atoms. Their computational

cost is insignificant in your model, and they don’t change anything if

the atoms they are attached to are frozen anyway.

Steve

Kasra · January 22, 2016, 12:29am

Thank you Axel and Steve for your prompt answers.

Steve:
I can leave the bonds active as they are in a small block of the simulation box and their computational cost is not significant, however, I also wanted to eliminate the contribution of these bonds potential to the system virial in order to have an unadulterated estimate of the pressure of the system.
I don’t see any issue when I take out the “delete_bonds” and the simulation always proceeds without getting stuck or giving any memory related errors.
I’m going to test a smaller system with valgrind, as Axel suggested, to see if I can spot any memory leak in delete_bonds part.

Thank you,
Kasra.

Kasra · January 22, 2016, 12:39am

Axel,

I tried running LAMMPS with valgrind, I have no problem running it on my machine but when I try it on the cluster that I usually run my simulation, it exits with following message:

==18477== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==18477== Command: ./lmp_13Jan16 -in in.melt
==18477==

Please verify that both the operating system and the processor support Intel® F16C instructions.

==18477==
==18477== HEAP SUMMARY:
==18477== in use at exit: 0 bytes in 0 blocks
==18477== total heap usage: 88 allocs, 88 frees, 73,579 bytes allocated

Although, it’s not a direct LAMMPS question but I thought you may have experienced such a behavior and know a workaround for that or you can point me to the right direction in resolving it, otherwise please disregard this message

Thank You,
Kasra.

akohlmey · January 22, 2016, 12:51am

Thank you Axel and Steve for your prompt answers.

Steve:
I can leave the bonds active as they are in a small block of the simulation
box and their computational cost is not significant, however, I also wanted
to eliminate the contribution of these bonds potential to the system virial
in order to have an unadulterated estimate of the pressure of the system.
I don't see any issue when I take out the "delete_bonds" and the simulation
always proceeds without getting stuck or giving any memory related errors.
I'm going to test a smaller system with valgrind, as Axel suggested, to see
if I can spot any memory leak in delete_bonds part.

considering steve's explanation, i would rule out a memory leak.
the following workaround makes much more sense:
- change the colloid pair style and pair coeff settings, so they
temporarily have a cutoff of 12 \AA including the global cutoff
- delete the bonds you want to get rid of
- restore the previous settings.

axel.

akohlmey · January 22, 2016, 12:53am

Axel,

I tried running LAMMPS with valgrind, I have no problem running it on my
machine but when I try it on the cluster that I usually run my simulation,
it exits with following message:

==18477== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==18477== Command: ./lmp_13Jan16 -in in.melt
==18477==

Please verify that both the operating system and the processor support
Intel(R) F16C instructions.

==18477==
==18477== HEAP SUMMARY:
==18477== in use at exit: 0 bytes in 0 blocks
==18477== total heap usage: 88 allocs, 88 frees, 73,579 bytes allocated

Although, it's not a direct LAMMPS question but I thought you may have
experienced such a behavior and know a workaround for that or you can point
me to the right direction in resolving it, otherwise please disregard this
message

if you cannot find a leak on your desktop, you won't find it on the
cluster either.

axel.

Kasra · February 14, 2016, 9:31pm

Hi All,

After employing your suggestion and running the code with valgrind I found no apparent memory leaks. However, I tried testing different component that are linked to the executable and I found that the culprit was the MPI library. Basically, I was employing MVAPICH2 (which is the default mpi library on the cluster that I run,) to compile LAMMPS with but then I tried compiling LAMMPS with openMPI and that 's when I saw that there is no such errors or getting stuck as before. However, I see a (20%) degradation in the performance of the simulation which I assume is due to config flags that is used for MPI library (?).

But then the baffling issue is that I could use lammps gpu package when I was using MVAPICH2 but the new compilation of lammps with openmpi doesn’t like gpu runs! and when I test it with lammps examples/accelerate/in.lj it crashes with following message:

mpirun noticed that process rank 1 with PID 57394 on node qb030 exited on signal 11 (Segmentation fault).

I know that this a very vague error message but that’s the only thing that I get from lammps-openmpi-1.10.1 in a gpu test! At first I thought that the openmpi should have been configured with “–with-cuda” to resolve the issue but recompiling openmpi with “–with-cuda” didn’t help either!

Versions information:
– LAMMPS 13 Jan 2016 ICMS

– MVAPICH2 version 2.0

– MVAPICH2 version 2.2b

– openmpi version 1.8.1
– openmpi version 1.10.1

I would appreciate it if you could help me to resolve the issues.

Best,
Kasra.