[lammps-users] crash on Blue Gene

Hello,

I am trying to run the lammps web site water spce example on a Blue Gene machine. I have compiled lammps and fftw. The script runs up to the run command and almost all mpi processes dump a core file. The output file shows:

[snip]
run 100
PPPM initialization ...
   G vector = 0.274879
   grid = 20 20 20
   stencil order = 5
   RMS precision = 5.1338e-05
   brick FFT buffer size/proc = 2448 300 2448
Setting up run ...

This same example runs fine on my linux box (I compiled lammps linked with fftw-2.1.5) on 4 mpi processes.

My questions are: is there a way to get a verbose output from lammps? Has anyone had a similar problem on a Blue Gene machine?

Inputs appreciated.

Thanks,

Just to add one more piece of information that points to a problem on either the FFTW library as compiled on the Blue Gene P machine or to how LAMMPS calls the library. I ran the lammps LJ benchmark script on different system sizes and the results are fine. The LJ benchmark does not use FFTW. Also here are the standard packages I have installed:

->make package-status
Installed NO: package ASPHERE
Installed NO: package CLASS2
Installed NO: package COLLOID
Installed NO: package DIPOLE
Installed NO: package DSMC
Installed NO: package GPU
Installed NO: package GRANULAR
Installed YES: package KSPACE
Installed YES: package MANYBODY
Installed NO: package MEAM
Installed YES: package MOLECULE
Installed YES: package OPT
Installed NO: package PERI
Installed NO: package POEMS
Installed NO: package REAX
Installed NO: package REPLICA
Installed NO: package SHOCK
Installed NO: package SRD
Installed NO: package XTC

Installed NO: package USER-ACKLAND
Installed NO: package USER-ATC
Installed NO: package USER-CD-EAM
Installed NO: package USER-CG-CMM
Installed NO: package USER-EFF
Installed NO: package USER-EWALDN
Installed NO: package USER-IMD
Installed NO: package USER-SMD

Thanks,

valmor,

Just to add one more piece of information that points to a problem on
either the FFTW library as compiled on the Blue Gene P machine or to how

if you are running on BG/P, you may be interested in trying out
the OpenMP pair styles from my lammps-icms branch for improved
scaling capability. check out: http://goo.gl/oKYI

LAMMPS calls the library. I ran the lammps LJ benchmark script on
different system sizes and the results are fine. The LJ benchmark does
not use FFTW. Also here are the standard packages I have installed:

please try out the peptide example input or the rhodo benchmark
inputs as well as they do use the FFT.

the important part about fftw is that you have to link agains
fftw2 and have it compiled in double precision mode.

more advanced FFT options and support for single precision
FFT and a simple internal FFT are - again - in my lammps-icms
branch. the latter may soon surface in the regular lammps
distribution, too.

my suspicion is that you are trying to run a large problem on
too few nodes and that you are running out of memory. running
the test/example inputs should give you confirmation.

cheers,
    axel.

From your output in the first message, you appear

to be running a tiny FFT, which should not cause
memory problems. Axel's suggestion is a good one -
I would just try running one of the examples or
benchmarks (peptide or rhodo) which use PPPM and
FFTs and see if they work. If not I would to
talk to one of your BG/P admins - something is not
configured or setup on your machine the way LAMMPS
expects, I would guess.

Steve

[snip]

if you are running on BG/P, you may be interested in trying out
the OpenMP pair styles from my lammps-icms branch for improved
scaling capability. check out: http://goo.gl/oKYI

Thanks; will take a look.

[snip]

please try out the peptide example input or the rhodo benchmark
inputs as well as they do use the FFT.

I tried the rhodo benchmark on 128 cores (also tried the water-spce example on 128 cores). Same problem (here is the rhodo output)

LAMMPS (3 Oct 2010)
Scanning data file ...
   4 = max bonds/atom
   8 = max angles/atom
   18 = max dihedrals/atom
   2 = max impropers/atom
Reading data file ...
   orthogonal box = (-27.5 -38.5 -36.3646) to (27.5 38.5 36.3615)
   4 by 8 by 4 processor grid
   32000 atoms
   32000 velocities
   27723 bonds
   40467 angles
   56829 dihedrals
   1034 impropers
Finding 1-2 1-3 1-4 neighbors ...
   4 = max # of 1-2 neighbors
   12 = max # of 1-3 neighbors
   24 = max # of 1-4 neighbors
   26 = max # of special neighbors
Finding SHAKE clusters ...
   1617 = # of size 2 clusters
   3633 = # of size 3 clusters
   747 = # of size 4 clusters
   4233 = # of frozen angles
PPPM initialization ...
   G vector = 0.248831
   grid = 25 32 32
   stencil order = 5
   RMS precision = 7.57143e-05
   brick FFT buffer size/proc = 1404 224 1404
Setting up run ...

Then I get a bunch of core files from the processes (don't understand them).

the important part about fftw is that you have to link agains
fftw2 and have it compiled in double precision mode.

I used the BG/P installed fftw-2.1.5 and also compiled my own (used libdfftw) with the compute node ibm C compiler.

more advanced FFT options and support for single precision
FFT and a simple internal FFT are - again - in my lammps-icms
branch. the latter may soon surface in the regular lammps
distribution, too.

Will try. Thanks.

my suspicion is that you are trying to run a large problem on
too few nodes and that you are running out of memory. running
the test/example inputs should give you confirmation.

I don't think it is a memory problem. I can run the water-spce example on my laptop and desktop; 128 cores on BG/P should provide enough memory.

Thanks for the help.

The sysadmin guys say they will try to install a new version of lammps. I wonder whether it is a FFTW problem. I guess I could try to run one of the check examples from the fftw distribution on one BG/P compute node.

Just double checking; LAMMPS does not use the fftw mpi library correct? libdfftw (or libfftw) is the library to link against.

Thanks,

From your output in the first message, you appear
to be running a tiny FFT, which should not cause
memory problems. Axel's suggestion is a good one -
I would just try running one of the examples or
benchmarks (peptide or rhodo) which use PPPM and
FFTs and see if they work. If not I would to
talk to one of your BG/P admins - something is not
configured or setup on your machine the way LAMMPS
expects, I would guess.

The sysadmin guys say they will try to install a new version of lammps.
I wonder whether it is a FFTW problem. I guess I could try to run one of
the check examples from the fftw distribution on one BG/P compute node.

Just double checking; LAMMPS does not use the fftw mpi library correct?

correct.

libdfftw (or libfftw) is the library to link against.

correct, too.

cheers,
   axel.