[lammps-users] problems with a very big simulation.

hi.
in a simulation with 5000000 of atoms lammps give me
the next error:

LAMMPS (22 Jan 2008)
Scanning data file ...
Reading data file ...
  orthogonal box = (-31 -31 -1) to (31 31 33)
  1 by 1 by 1 processor grid
  5905302 atoms
Finding 1-2 1-3 1-4 neighbors ...
  0 = max # of 1-2 neighbors
  0 = max # of 1-3 neighbors
  0 = max # of 1-4 neighbors
  1 = max # of special neighbors
5905302 atoms in group bloque
33346 atoms in group suelo
WARNING: Resetting reneighboring criteria during
minimization
Setting up minimization ...
p0_6380: p4_error: interrupt SIGSEGV: 11

i don't know, if this error are giving for lammps or
for the computer. someone could tell me what means
this error.

thank you.

???> hi.
???> in a simulation with 5000000 of atoms lammps give me
???> the next error:
???>

[...]

???> Setting up minimization ...
???> p0_6380: p4_error: interrupt SIGSEGV: 11

???> i don't know, if this error are giving for lammps or
???> for the computer. someone could tell me what means
???> this error.

segmentation fault happens when you access memory that
you are not allowed to. it could be that you have hit
a limit in your MPI implementation or that you are
simply running out of memory. you seem to be using
MPICH (use OpenMPI instead, it is _much_ better, BTW)
which also has a tendency to segfault whenever there
is a failure.

i'd recompile LAMMPS with the bundled STUBS library
to see whether to blame the MPI library. if you still
get a segfault, you should enable core dumps and
use a debugger to find out where it fails.

cheers,
   axel.

???>
???> thank you.
???>
???>
???> ______________________________________________
???> Enviado desde Correo Yahoo!
???> M�s formas de estar en contacto. http://es.docs.yahoo.com/mail/overview/index.html
???>
???> -------------------------------------------------------------------------
???> Check out the new SourceForge.net Marketplace.
???> It's the best place to buy or sell services for
???> just about anything Open Source.
???> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
???> _______________________________________________
???> lammps-users mailing list
???> [email protected]
???> https://lists.sourceforge.net/lists/listinfo/lammps-users
???>

I'd run a smaller problem to see if you can reproduce/debug
the issue.

Steve

--- Steve Plimpton <[email protected]> escribi�:

I'd run a smaller problem to see if you can
reproduce/debug
the issue.

Steve

> hi.
> in a simulation with 5000000 of atoms lammps give
me
> the next error:
>
> LAMMPS (22 Jan 2008)
> Scanning data file ...
> Reading data file ...
> orthogonal box = (-31 -31 -1) to (31 31 33)
> 1 by 1 by 1 processor grid
> 5905302 atoms
> Finding 1-2 1-3 1-4 neighbors ...
> 0 = max # of 1-2 neighbors
> 0 = max # of 1-3 neighbors
> 0 = max # of 1-4 neighbors
> 1 = max # of special neighbors
> 5905302 atoms in group bloque
> 33346 atoms in group suelo
> WARNING: Resetting reneighboring criteria during
> minimization
> Setting up minimization ...
> p0_6380: p4_error: interrupt SIGSEGV: 11
>
> i don't know, if this error are giving for lammps
or
> for the computer. someone could tell me what
means
> this error.
>
> thank you.
>

hello
i try simulate other problem more little (with only
388772 atoms and gime me the same problems:
with run:
[email protected]...:~> lmp_qed < in.ganinn
LAMMPS (22 Jan 2008)
Scanning data file ...
Reading data file ...
  orthogonal box = (-13 -13 -1) to (13 13 15)
  1 by 1 by 1 processor grid
  388772 atoms
Finding 1-2 1-3 1-4 neighbors ...
  0 = max # of 1-2 neighbors
  0 = max # of 1-3 neighbors
  0 = max # of 1-4 neighbors
  1 = max # of special neighbors
388772 atoms in group bloque
5372 atoms in group suelo
Setting up run ...
p0_11642: p4_error: interrupt SIGSEGV: 11

with minimice
[email protected]...:~> lmp_qed < in.ganinn
LAMMPS (22 Jan 2008)
Scanning data file ...
Reading data file ...
  orthogonal box = (-13 -13 -1) to (13 13 15)
  1 by 1 by 1 processor grid
  388772 atoms
Finding 1-2 1-3 1-4 neighbors ...
  0 = max # of 1-2 neighbors
  0 = max # of 1-3 neighbors
  0 = max # of 1-4 neighbors
  1 = max # of special neighbors
388772 atoms in group bloque
5372 atoms in group suelo
WARNING: Resetting reneighboring criteria during
minimization
Setting up minimization ...
p0_13236: p4_error: interrupt SIGSEGV: 11

i also send the same problem to the cluster with 8
processor and give my the next output:
LAMMPS (22 Jan 2008)
Scanning data file ...
Reading data file ...
  orthogonal box = (-13 -13 -1) to (13 13 15)
  2 by 2 by 2 processor grid
  388772 atoms
Finding 1-2 1-3 1-4 neighbors ...
  0 = max # of 1-2 neighbors
  0 = max # of 1-3 neighbors
  0 = max # of 1-4 neighbors
  1 = max # of special neighbors
388772 atoms in group bloque
5372 atoms in group suelo
WARNING: Resetting reneighboring criteria during
minimization
Setting up minimization ...
p7_6617: p4_error: net_recv read: probable EOF on
socket: 1
rm_l_7_6632: (3.429688) net_send: could not write to
fd=5, errno = 32
p1_16524: p4_error: net_recv read: probable EOF on
socket: 343
rm_l_1_16539: (4.257812) net_send: could not write to
fd=5, errno = 32
p2_29415: p4_error: interrupt SIGSEGV: 11
rm_l_2_29430: (4.109375) net_send: could not write to
fd=5, errno = 32

i send you the files used in this problems, can
someone tell me where is the error becouse the cluster
have 80X8GB.

the data file have 16GB

thank you for all.

in.ganinn (3.74 KB)

GANINN.sw (2.3 KB)

Can you post the smallest version of your problem
that crashes with the run command: input script/data file.

Steve

2008/4/2 no tengo nombre <[email protected]...>:

Your simulation has a box of size 10000 cubic Angstroms
with 380K atoms in it. In other words, 35 atoms per
cubic Angstrom. I don't think that's GaN.

For normal Stillinger-Weber cutoffs of a few Angstrom's
this is generating monstrous neighbor lists, which is
causing the crash. If you set

neigh_modify page 1000000 one 100000

you can make it run w/out overflowing the normal
max neighbors/atom assumption.

You're also running with the "nsq" neighbor list
option which will be extremely slow, especially
when you try to run your 5M atom problem.

Steve

2008/4/2 no tengo nombre <[email protected]...>:

hello.
i put the next line in my previous script:
neigh_modify one 8250 page 82500
and don't give me te previous error but, now a don't
can allocate all memory for the simulation, the system
give me core dumped in the cluster and in a computer
with 4GB RAM i see the system monitor and the lammps
process assign memory up to 2'7 GB and fail with the
message:
Failed to allocate 100000 bytes for array
neightlist:pair[i]

do you know some metod for lauch the process with more
memory

thank you.
--- Steve Plimpton <[email protected]> escribi�:

Your simulation has a box of size 10000 cubic
Angstroms
with 380K atoms in it. In other words, 35 atoms per
cubic Angstrom. I don't think that's GaN.

For normal Stillinger-Weber cutoffs of a few
Angstrom's
this is generating monstrous neighbor lists, which
is
causing the crash. If you set

neigh_modify page 1000000 one 100000

you can make it run w/out overflowing the normal
max neighbors/atom assumption.

You're also running with the "nsq" neighbor list
option which will be extremely slow, especially
when you try to run your 5M atom problem.

Steve

2008/4/2 no tengo nombre <[email protected]...>:
>
> --- Steve Plimpton <[email protected]> escribi�:
>
>
>
> > I'd run a smaller problem to see if you can
> > reproduce/debug
> > the issue.
> >
> > Steve
> >
> > > hi.
> > > in a simulation with 5000000 of atoms lammps
give
> > me
> > > the next error:
> > >
> > > LAMMPS (22 Jan 2008)
> > > Scanning data file ...
> > > Reading data file ...
> > > orthogonal box = (-31 -31 -1) to (31 31 33)
> > > 1 by 1 by 1 processor grid
> > > 5905302 atoms
> > > Finding 1-2 1-3 1-4 neighbors ...
> > > 0 = max # of 1-2 neighbors
> > > 0 = max # of 1-3 neighbors
> > > 0 = max # of 1-4 neighbors
> > > 1 = max # of special neighbors
> > > 5905302 atoms in group bloque
> > > 33346 atoms in group suelo
> > > WARNING: Resetting reneighboring criteria
during
> > > minimization
> > > Setting up minimization ...
> > > p0_6380: p4_error: interrupt SIGSEGV: 11
> > >
> > > i don't know, if this error are giving for
lammps
> > or
> > > for the computer. someone could tell me what
> > means
> > > this error.
> > >
> > > thank you.
> > >
> hello
> i try simulate other problem more little (with
only
> 388772 atoms and gime me the same problems:
> with run:
> [email protected]...:~> lmp_qed < in.ganinn
>
> LAMMPS (22 Jan 2008)
> Scanning data file ...
> Reading data file ...
> orthogonal box = (-13 -13 -1) to (13 13 15)
>
> 1 by 1 by 1 processor grid
> 388772 atoms
>
> Finding 1-2 1-3 1-4 neighbors ...
> 0 = max # of 1-2 neighbors
> 0 = max # of 1-3 neighbors
> 0 = max # of 1-4 neighbors
> 1 = max # of special neighbors
> 388772 atoms in group bloque
> 5372 atoms in group suelo
> Setting up run ...
> p0_11642: p4_error: interrupt SIGSEGV: 11
>
> with minimice
> [email protected]...:~> lmp_qed < in.ganinn
>
> LAMMPS (22 Jan 2008)
> Scanning data file ...
> Reading data file ...
> orthogonal box = (-13 -13 -1) to (13 13 15)
>
> 1 by 1 by 1 processor grid
> 388772 atoms
>
> Finding 1-2 1-3 1-4 neighbors ...
> 0 = max # of 1-2 neighbors
> 0 = max # of 1-3 neighbors
> 0 = max # of 1-4 neighbors
> 1 = max # of special neighbors
> 388772 atoms in group bloque
> 5372 atoms in group suelo
>
> WARNING: Resetting reneighboring criteria during
> minimization
> Setting up minimization ...
> p0_13236: p4_error: interrupt SIGSEGV: 11
>
> i also send the same problem to the cluster with
8
> processor and give my the next output:
>
> LAMMPS (22 Jan 2008)
> Scanning data file ...
> Reading data file ...
> orthogonal box = (-13 -13 -1) to (13 13 15)
> 2 by 2 by 2 processor grid
> 388772 atoms
>
> Finding 1-2 1-3 1-4 neighbors ...
> 0 = max # of 1-2 neighbors
> 0 = max # of 1-3 neighbors
> 0 = max # of 1-4 neighbors
> 1 = max # of special neighbors
> 388772 atoms in group bloque
> 5372 atoms in group suelo
>
> WARNING: Resetting reneighboring criteria during
> minimization
> Setting up minimization ...
> p7_6617: p4_error: net_recv read: probable EOF
on
> socket: 1
> rm_l_7_6632: (3.429688) net_send: could not write
to
> fd=5, errno = 32
> p1_16524: p4_error: net_recv read: probable EOF
on
> socket: 343
> rm_l_1_16539: (4.257812) net_send: could not
write to
> fd=5, errno = 32
> p2_29415: p4_error: interrupt SIGSEGV: 11
> rm_l_2_29430: (4.109375) net_send: could not
write to
> fd=5, errno = 32
>
> i send you the files used in this problems, can
> someone tell me where is the error becouse the
cluster
> have 80X8GB.
>
> the data file have 16GB
>
> thank you for all.
>
>
>
>
______________________________________________
> �Con Mascota por primera vez? S� un mejor Amigo.
Entra en Yahoo! Respuestas
http://es.answers.yahoo.com/info/welcome
> # simulacion de estructura INAS pura comprimida
(con paredes y sin paredes y luego sin comprimir)
>
>
>
> dimension 3
>
> boundary s s s
>
> newton on
>
> neighbor 2 nsq
> processors 2 2 2
>
> #neigh_modify delay 5
>
> atom_style molecular
>
> pair_style sw
>
> units metal
>
> #lectura de estructura
>
>
>

=== message truncated ===

If you're running out of physical memory, you can't run that big
a problem. You never answered the question as to why your
atom density is so unphysically high.

Steve

hello.
i solve the problem in my simulation, changing the
scale (nm to anstrom).

but now the simulation give me other error the atoms
take NaNs positions.

i send the new scrip and dump file.

also, i change the input data file for other and the
simulation run perfectly.

what could be the problem?

thank you for all.

dump.ganinn (475 KB)

GANINN.sw (2.3 KB)

in.ganinn (873 Bytes)

log.lammps (1.52 KB)

data.ganinn2 (364 KB)