delete_bonds peculiar behavior

Hi All,

I’m seeing a inconsistent behavior when I use “delete_bonds”. I posed a question regarding this issue while I was working on load balancing of a simulation which is available in:
The problem is the same one that I described in above thread, i.e. a big colloidal solute (10nm in diameter) is in water which is covering both sides of a solid polymeric wall (see attached figure). In order to keep the wall in place, I define a block within the wall (with a width of 5anstrom and its xy plane extent are the same as the wall itself) and I exclude this block from integration and I also delete the bonds within this block to eliminate their potential contributions to the simulation. However, deleting bonds is where I see some peculiar behavior which is not identically reproducible each time that I run the system! If there is no colloidal particle in the system and I’m not setting neighbor lists and cut offs to satisfy the colloidal interaction requirements, delete_bonds works just normal. However, in the presence of the colloidal particle each time that I run the simulation I see following the behaviors sporadically: – Sometimes it smoothly pass the “delete_bonds” and the simulation goes as it should – Sometimes it gets stuck forever in “Deleting bonds …” mode as I reported in (). – Sometimes it aborts the job by giving the following errors: == When I was using LAMMPS (11 Sep 2015-ICMS) as I reported in () : : : Deleting bonds … [cli_79]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 79 == Now I’m using LAMMPS (13 Jan 2016-ICMS) and at times that I get errors they’re either of the following ones: ** Deleting bonds … ERROR on proc 139: Failed to reallocate 1310720 bytes for array atom:bond_atom (…/memory.cpp:66) [cli_139]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 139 OR ** Deleting bonds … ERROR on proc 29: Failed to reallocate 1572864 bytes for array atom:nspecial (…/memory.cpp:66) [cli_29]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 29 OR I have also seen errors the same as above but arising from “atom:improper_atom” or “atom:dihedral_atom”. I tried to make working example with a simplified version of my simulation but I was not able to reproduce the error for smaller systems. I have no clue what triggers such a behavior: is it LAMMPS issue? or something with the cluster that I’m running on? or my input file parameter settings? I’m attaching the input file that I use for further clarification of what I’m doing, however, the data file for the system is huge (almost 70 MB) but I can provide it if it’s needed for further investigations. Best, Kasra.

field_snapshot.jpg

delete_bonds-issue.in (2.92 KB)

Hi All,

I'm seeing a inconsistent behavior when I use "delete_bonds". I posed a
question regarding this issue while I was working on load balancing of a
simulation which is available in:
http://lammps.sandia.gov/threads/msg56632.html

The problem is the same one that I described in above thread, i.e. a big
colloidal solute (10nm in diameter) is in water which is covering both sides
of a solid polymeric wall (see attached figure). In order to keep the wall
in place, I define a block within the wall (with a width of 5anstrom and its
xy plane extent are the same as the wall itself) and I exclude this block
from integration and I also delete the bonds within this block to eliminate
their potential contributions to the simulation. However, deleting bonds is
where I see some peculiar behavior which is not identically reproducible
each time that I run the system! If there is no colloidal particle in the
system and I'm not setting neighbor lists and cut offs to satisfy the
colloidal interaction requirements, delete_bonds works just normal. However,
in the presence of the colloidal particle each time that I run the
simulation I see following the behaviors sporadically:

-- Sometimes it smoothly pass the "delete_bonds" and the simulation goes as
it should
-- Sometimes it gets stuck forever in "Deleting bonds ..." mode as I
reported in (http://lammps.sandia.gov/threads/msg56632.html).
-- Sometimes it aborts the job by giving the following errors:

        == When I was using LAMMPS (11 Sep 2015-ICMS) as I reported in
(http://lammps.sandia.gov/threads/msg56632.html) :

                :
                :
                Deleting bonds ...
                ERROR on proc 79: Failed to reallocate 1966080 bytes for
array atom:x (../memory.cpp:66)
                [cli_79]: aborting job:
                application called MPI_Abort(MPI_COMM_WORLD, 1) - process 79

        == Now I'm using LAMMPS (13 Jan 2016-ICMS) and at times that I get
errors they're either of the following ones:

               ** Deleting bonds ...
                    ERROR on proc 139: Failed to reallocate 1310720 bytes
for array atom:bond_atom (../memory.cpp:66)
                    [cli_139]: aborting job:
                    application called MPI_Abort(MPI_COMM_WORLD, 1) -
process 139

             OR

             ** Deleting bonds ...
                   ERROR on proc 29: Failed to reallocate 1572864 bytes for
array atom:nspecial (../memory.cpp:66)
                   [cli_29]: aborting job:
                   application called MPI_Abort(MPI_COMM_WORLD, 1) - process
29

            OR
                  I have also seen errors the same as above but arising from
"atom:improper_atom" or "atom:dihedral_atom".

all of these messages mean, that LAMMPS tries to allocate between 1GB
and 2GB additional memory and fails.
your job is running out of RAM (or rather address space).

I tried to make working example with a simplified version of my simulation
but I was not able to reproduce the error for smaller systems. I have no
clue what triggers such a behavior: is it LAMMPS issue? or something with

no surprise. smaller systems require less RAM.

the cluster that I'm running on? or my input file parameter settings? I'm
attaching the input file that I use for further clarification of what I'm
doing, however, the data file for the system is huge (almost 70 MB) but I
can provide it if it's needed for further investigations.

what you *could* do is to run a smaller test system with valgrind and
see, if there are any (relevant) memory leaks.

axel.

I’m guessing what is different when you have vs do not have

the colloid particle is your cutoffs. You have a max cutoff of 65 Angs

with the colloid, which means huge numbers of ghost atoms must

be stored, huge per-atom arrays, etc. That is likely why you

are running out of memory. All of this should be fine if you

use more procs or run a smaller problem.

I imagine delete_bonds is croaking not b/c of anything delete_bonds

itself does, but b/c it first acquires ghost atoms, just like a run

command would do. To delete bonds you only need a cutoff

long enough to encompass the longest bond. So you could

define a simple lj/cut potential for everything with a short cutoff

before you delete bonds, delete the bonds, then re-define the

pair style and cutoffs for your real model.

I am guessing you will then have the same problem as the 1st
paragraph above with

memory, etc when you perform a run command with the huge

cutoffs.

I’ll also note that I don’t see any reason to delete a few bonds

inside a frozen block of atoms. Their computational

cost is insignificant in your model, and they don’t change anything if

the atoms they are attached to are frozen anyway.

Steve

Thank you Axel and Steve for your prompt answers.

Steve:
I can leave the bonds active as they are in a small block of the simulation box and their computational cost is not significant, however, I also wanted to eliminate the contribution of these bonds potential to the system virial in order to have an unadulterated estimate of the pressure of the system.
I don’t see any issue when I take out the “delete_bonds” and the simulation always proceeds without getting stuck or giving any memory related errors.
I’m going to test a smaller system with valgrind, as Axel suggested, to see if I can spot any memory leak in delete_bonds part.

Thank you,
Kasra.

Axel,

I tried running LAMMPS with valgrind, I have no problem running it on my machine but when I try it on the cluster that I usually run my simulation, it exits with following message:

==18477== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==18477== Command: ./lmp_13Jan16 -in in.melt
==18477==

Please verify that both the operating system and the processor support Intel® F16C instructions.

==18477==
==18477== HEAP SUMMARY:
==18477== in use at exit: 0 bytes in 0 blocks
==18477== total heap usage: 88 allocs, 88 frees, 73,579 bytes allocated

Although, it’s not a direct LAMMPS question but I thought you may have experienced such a behavior and know a workaround for that or you can point me to the right direction in resolving it, otherwise please disregard this message :slight_smile:

Thank You,
Kasra.

Thank you Axel and Steve for your prompt answers.

Steve:
I can leave the bonds active as they are in a small block of the simulation
box and their computational cost is not significant, however, I also wanted
to eliminate the contribution of these bonds potential to the system virial
in order to have an unadulterated estimate of the pressure of the system.
I don't see any issue when I take out the "delete_bonds" and the simulation
always proceeds without getting stuck or giving any memory related errors.
I'm going to test a smaller system with valgrind, as Axel suggested, to see
if I can spot any memory leak in delete_bonds part.

considering steve's explanation, i would rule out a memory leak.
the following workaround makes much more sense:
- change the colloid pair style and pair coeff settings, so they
temporarily have a cutoff of 12 \AA including the global cutoff
- delete the bonds you want to get rid of
- restore the previous settings.

axel.

Axel,

I tried running LAMMPS with valgrind, I have no problem running it on my
machine but when I try it on the cluster that I usually run my simulation,
it exits with following message:

==18477== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==18477== Command: ./lmp_13Jan16 -in in.melt
==18477==

Please verify that both the operating system and the processor support
Intel(R) F16C instructions.

==18477==
==18477== HEAP SUMMARY:
==18477== in use at exit: 0 bytes in 0 blocks
==18477== total heap usage: 88 allocs, 88 frees, 73,579 bytes allocated

Although, it's not a direct LAMMPS question but I thought you may have
experienced such a behavior and know a workaround for that or you can point
me to the right direction in resolving it, otherwise please disregard this
message :slight_smile:

if you cannot find a leak on your desktop, you won't find it on the
cluster either.

axel.

Hi All,

After employing your suggestion and running the code with valgrind I found no apparent memory leaks. However, I tried testing different component that are linked to the executable and I found that the culprit was the MPI library. Basically, I was employing MVAPICH2 (which is the default mpi library on the cluster that I run,) to compile LAMMPS with but then I tried compiling LAMMPS with openMPI and that 's when I saw that there is no such errors or getting stuck as before. However, I see a (20%) degradation in the performance of the simulation which I assume is due to config flags that is used for MPI library (?).

But then the baffling issue is that I could use lammps gpu package when I was using MVAPICH2 but the new compilation of lammps with openmpi doesn’t like gpu runs! and when I test it with lammps examples/accelerate/in.lj it crashes with following message:

mpirun noticed that process rank 1 with PID 57394 on node qb030 exited on signal 11 (Segmentation fault).

I know that this a very vague error message but that’s the only thing that I get from lammps-openmpi-1.10.1 in a gpu test! At first I thought that the openmpi should have been configured with “–with-cuda” to resolve the issue but recompiling openmpi with “–with-cuda” didn’t help either!

Versions information:
– LAMMPS 13 Jan 2016 ICMS

– MVAPICH2 version 2.0

– MVAPICH2 version 2.2b

– openmpi version 1.8.1
– openmpi version 1.10.1

I would appreciate it if you could help me to resolve the issues.

Best,
Kasra.