[lammps-users] error message *** glibc detected ***

Dear Lammps developers

     I run a nvt calculation of two defect tubes welding. The job collapsed
after ~130000 steps. It seems the error relates the memory allocation. I
have copied the error messages and my input file. Can anyone point out
what's goes wrong.

thanks
haibin

Haibin Chen Ph.D.
Mechanical Engineering Dept
Carnegie Mellon University
Pittsburgh, Pa, 15213

copy error message:

......

  129500 993.98788 -18192.26 -18526.188*** glibc detected ***
./lmp_serial: double free or corruption (!prev): 0x0000000000860ac0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3010270412]
/lib64/libc.so.6(cfree+0x8c)[0x3010273b1c]
./lmp_serial[0x4feb57]
./lmp_serial[0x47855c]
./lmp_serial[0x46d27b]
./lmp_serial[0x520e95]
./lmp_serial[0x5b40da]
./lmp_serial[0x59d290]
./lmp_serial[0x4f80ab]
./lmp_serial[0x4f8bea]
./lmp_serial[0x4fea73]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x301021dab4]
./lmp_serial(__gxx_personality_v0+0x129)[0x401bd9]
======= Memory map: ========
00400000-00607000 r-xp 00000000 00:1c 1492942
/net/ntpl/home/haibin/lammps/lammps-8Feb10/My_Study/TWO_TUBE_H/weldin
g/nvt/1000K/lmp_serial
00807000-00808000 rw-p 00207000 00:1c 1492942
/net/ntpl/home/haibin/lammps/lammps-8Feb10/My_Study/TWO_TUBE_H/weldin
g/nvt/1000K/lmp_serial
00808000-009b4000 rw-p 00808000 00:00 0
[heap]
300fa00000-300fa1a000 r-xp 00000000 08:03 32564
/lib64/ld-2.6.so
300fc1a000-300fc1b000 r--p 0001a000 08:03 32564
/lib64/ld-2.6.so
300fc1b000-300fc1c000 rw-p 0001b000 08:03 32564
/lib64/ld-2.6.so
3010200000-3010347000 r-xp 00000000 08:03 32574
/lib64/libc-2.6.so
3010347000-3010547000 ---p 00147000 08:03 32574
/lib64/libc-2.6.so
3010547000-301054b000 r--p 00147000 08:03 32574
/lib64/libc-2.6.so
301054b000-301054c000 rw-p 0014b000 08:03 32574
/lib64/libc-2.6.so
301054c000-3010551000 rw-p 301054c000 00:00 0
3010600000-3010682000 r-xp 00000000 08:03 32602
/lib64/libm-2.6.so
3010682000-3010881000 ---p 00082000 08:03 32602
/lib64/libm-2.6.so
3010881000-3010882000 r--p 00081000 08:03 32602
/lib64/libm-2.6.so
3010882000-3010883000 rw-p 00082000 08:03 32602
/lib64/libm-2.6.so
3cfd800000-3cfd80d000 r-xp 00000000 08:03 32594
/lib64/libgcc_s-4.1.2-20070925.so.1
3cfd80d000-3cfda0d000 ---p 0000d000 08:03 32594
/lib64/libgcc_s-4.1.2-20070925.so.1
3cfda0d000-3cfda0e000 rw-p 0000d000 08:03 32594
/lib64/libgcc_s-4.1.2-20070925.so.1
3cfdc00000-3cfdce5000 r-xp 00000000 08:03 1209055
/usr/lib64/libstdc++.so.6.0.8
3cfdce5000-3cfdee5000 ---p 000e5000 08:03 1209055
/usr/lib64/libstdc++.so.6.0.8
3cfdee5000-3cfdeeb000 r--p 000e5000 08:03 1209055
/usr/lib64/libstdc++.so.6.0.8
3cfdeeb000-3cfdeee000 rw-p 000eb000 08:03 1209055
/usr/lib64/libstdc++.so.6.0.8
3cfdeee000-3cfdf00000 rw-p 3cfdeee000 00:00 0
2aaaaaaab000-2aaaaaabe000 rw-p 2aaaaaaab000 00:00 0
2aaaaaad3000-2aaaaaf1e000 rw-p 2aaaaaad3000 00:00 0
2aaaac000000-2aaaac021000 rw-p 2aaaac000000 00:00 0
2aaaac021000-2aaab0000000 ---p 2aaaac021000 00:00 0
7fff02c63000-7fff02c79000 rw-p 7ffffffe9000 00:00 0
[stack]
7fff02dc1000-7fff02dc4000 r-xp 7fff02dc1000 00:00 0
[vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
[vsyscall]

......
/var/spool/torque/mom_priv/jobs/2988.ntpl.b.SC: line 43: 8610 Aborted
            ./lmp_serial -var datafile data.cnt < in.nvt

in.nvt (784 Bytes)

data.cnt (102 KB)

Dear Lammps developers

I run a nvt calculation of two defect tubes welding\. The job collapsed

after ~130000 steps. It seems the error relates the memory allocation. I
have copied the error messages and my input file. Can anyone point out
what's goes wrong.

there are a lot of possible reasons. the error message comes deep from the
innards of the lib library and indicates that there is likely a
programming error,
a hardware defect, or your system is behaving very badly and violates some
of the assumptions about a well behaving simulation that LAMMPS does to
avoid having to do many tests in time critical regions of the code.

nobody is going to run a test job that would take _that_ long to
(potentially) reproduce the error.

the first thing you should do is to update to the latest version of
lammps with all bugfixes applied. before anybody is going to even
try reproducing this, you should use a code that includes all available
corrections to avoid fixing the same bug multiple times.

then i would suggest to run with periodic restarts and see if the
issue is reproducible under given circumstances. if yes, you can
use the last working restart before the crash, feed it to restart2data
and see if it still reproduces the issue, but much sooner than before.
if yes, come back with those files, and there is a much better chance
to have this sorted out.

if you want to debug on your own, you should perhaps try using
valgrind, but since this makes execution very, _very_ slow, you first
want to be able to get to the critical point as fast as possible.

if it is an issue that only shows up after a certain time since beginning
of the run, you are between a rock and a hard place...

cheers,
   axel.

dear haibin,

Dear Lammps developers

     I run a nvt calculation of two defect tubes welding. The job
collapsed
after ~130000 steps. It seems the error relates the memory allocation.

I have copied the error messages and my input file. Can anyone point
out what's goes wrong.

i have been "playing" with your system to see how well it responds
to OpenMP parallelization instead of MPI (and found that it responds
very well, i.e. it parallelizes much better and more consistently).

while running i noticed that your CNT pair starts rotating, yet
you have a fixed and too small box. perhaps one of your CNTs
peeks out of the box and you "lose" atoms (why do you tell LAMMPS
to ignore them. this is very bad for your model). this may confuse
the AIREBO potential, since it maintains its own neighborlists
which may not expect the sudden loss of atoms and thus result
in memory corruption.

why don't you use shrinkwrap boundary conditions instead?

cheers,
    axel.

thanks Axel.

Your comments make a lot sense to me. I will try to fix the tube ends of
the tubes and see if the error comes up again.

best regards
haibin