Out of range atoms - cannot compute PPPM

Dear all,
I have a "architecture dependent" problem

I have a system with ~800thousand atoms, cell size in Angstrom :
829440 atoms

2 atom types
0.0 319.27280 xlo xhi
0.0 138.2492 ylo yhi
0.0 176.0 zlo zhi

lennard jones + charges
I paste the input at the end of the e-mail

If i run the system on a CRAY-Xe6 on e.g. 4096 cores it perfectly works
AMD Interlagos, mkl, fftw3, intel12 compiler and mpiexec (see makefile.cray)

It perfectly works (the system is "physically sound" )

If I run it on the "superMUC" architecture at the Leibniz Supercomputing center
Sandy Bridge-EP Intel Xeon E5-2680, intel 12 compiler, mkl, fftw, ibm
mpi poe (see makefile.muc)
It works on e.g. 2048 cores but on 4096 cores I get

Out of range atoms - cannot compute PPPM already at the first step.

What could be the reason of this, what could I do?

Thanks in advance for any support

Kind regards,

Carlo
units real

atom_style full

boundary p p p

read_data initposh.dat

mass 1 12.011
mass 2 0.0001

velocity all create 0.00000 87287

pair_style hybrid/overlay lj/charmm/coul/charmm/inter 29.5 30 0 0 coul/long 20.0

pair_coeff 1 1 lj/charmm/coul/charmm/inter 0 0 0 0 0.0298 3.4
pair_coeff 1 2 lj/charmm/coul/charmm/inter 0 0 0 0 0.0298 3.5
pair_coeff 2 2 lj/charmm/coul/charmm/inter 0 0 0 0 0.0298 3.6
pair_coeff * * coul/long
pair_modify table 0

kspace_style pppm 1.0e-10

neighbor 0.5 bin
neigh_modify every 20 delay 0 check no one 20000 page 200000

fix 1 all rigid molecule langevin 200.0 200.0 500.0 123456

thermo 25
thermo_style custom step temp ke pe etotal
thermo_modify format 5 %22.14g

run 100

makefile.muc (2.84 KB)

makefile.cray (3.1 KB)

I assume you are using the same (current) version
of LAMMPS on both machines. At start-up it prints
the PPPM info about the grid size, proc layout, etc.
Is that identical in both runs? Sometimes a compiler
with an aggressive optimization switch will do
something bad, so you might compile on the bad
machine with a lower optimization and see if the
behavior changes.

Steve

Dear Steve,

sure: same code (9 march 2013) on both architectures, I also tried -O
as optimization
level on superMUC

What came to my attention is the unusual processor grid coming out
from superMUC with 4096 cores..

PPPM superMUC 2048 cores:

LAMMPS (9 Mar 2013)
Scanning data file ...
Reading data file ...
  orthogonal box = (0 0 0) to (319.273 138.249 176)
  16 by 8 by 16 MPI processor grid
  829440 atoms
Finding 1-2 1-3 1-4 neighbors ...
  0 = max # of 1-2 neighbors
  0 = max # of 1-3 neighbors
  0 = max # of 1-4 neighbors
  1 = max # of special neighbors
9216 rigid bodies with 829440 atoms
PPPM initialization ...
  G vector (1/distance)= 0.214552
  grid = 960 540 625
  stencil order = 5
  estimated absolute RMS force accuracy = 4.22424e-08
  estimated relative force accuracy = 1.27212e-10
  using double precision FFTs
  3d grid and FFT values/proc = 228068 163200
Setting up run ...
Memory usage per processor = 55.7624 Mbytes
Step Temp KinEng PotEng TotEng
       0 0 0 -14231513 -14231512.745613
      25 18.365988 1009.0147 -14231516 -14230507.019635

PPPM superMUC: 4096 cores

LAMMPS (9 Mar 2013)
Scanning data file ...
Reading data file ...
  orthogonal box = (0 0 0) to (319.273 138.249 176)
  683 by 2 by 3 MPI processor grid
  829440 atoms
Finding 1-2 1-3 1-4 neighbors ...
  0 = max # of 1-2 neighbors
  0 = max # of 1-3 neighbors
  0 = max # of 1-4 neighbors
  1 = max # of special neighbors
9216 rigid bodies with 829440 atoms
PPPM initialization ...
  G vector (1/distance)= 0.214552
  grid = 960 540 625
  stencil order = 5
  estimated absolute RMS force accuracy = 4.22424e-08
  estimated relative force accuracy = 1.27212e-10
  using double precision FFTs
  3d grid and FFT values/proc = 478656 112860
Setting up run ...
Memory usage per processor = 74.2886 Mbytes
Step Temp KinEng PotEng TotEng
       0 0 0 -nan -nan
ERROR on proc 455: Out of range atoms - cannot compute PPPM (pppm.cpp:1667)

PPPM cray 4096 cores

LAMMPS (9 Mar 2013)
Scanning data file ...
Reading data file ...
  orthogonal box = (0 0 0) to (319.273 138.249 176)
  32 by 8 by 16 MPI processor grid
  829440 atoms
Finding 1-2 1-3 1-4 neighbors ...
  0 = max # of 1-2 neighbors
  0 = max # of 1-3 neighbors
  0 = max # of 1-4 neighbors
  1 = max # of special neighbors
9216 rigid bodies with 829440 atoms
PPPM initialization ...
  G vector (1/distance)= 0.214552
  grid = 960 540 625
  stencil order = 5
  estimated absolute RMS force accuracy = 4.22424e-08
  estimated relative force accuracy = 1.27212e-10
  using double precision FFTs
  3d grid and FFT values/proc = 125948 86400
Setting up run ...
Memory usage per processor = 37.9032 Mbytes
Step Temp KinEng PotEng TotEng
       0 0 0 -14231513 -14231512.745613
      25 18.365988 1009.0147 -14231516 -14230507.019635

Look at the number of processors along each spatial direction in your output. Something is fishy.
Daniel

Dear Daniel, sure with a grid of 600 in x it cannot work
my problem is why this is going to happen? how can I avoid it?

Kind regards

Carlo

Dear Daniel, sure with a grid of 600 in x it cannot work

it should work, too.

my problem is why this is going to happen? how can I avoid it?

well, did you request 4096 or 4098 CPUs?
if you requested 4096 in both cases, you first need to find out how it is
possible that LAMMPS would thing it has 4098 CPUs.

axel.

You must have specified 4098 procs, not 4096.
LAMMPS will throw an error if the number of
procs it ends up with for Px x Py x Pz is not equal
to the total # of procs allocated.

Steve

Dear all I have to check I am waiting for the machine to exit
maintenance, I apologize for this.
I really hope that this is the case but still I'm sure the problem
will be there with the correct number of cores

I'll come back with excuses or further error details as soon as possible.

Kind regards

Carlo

Dear all, sorry for being so late but in the last days there were
problems with supermuc.

I HAVE TO APOLOGIZE

Both Axel and Steve were of course right in pointing out that I
specified the wrong number
of cores (4098 instead of 4096)
still I have problems with some smaller inputs but the one I specified
before is now stable.

If I will be able to create a simple and stable error case I will be
back bothering you
for a possible bug otherwise I have no bug to report.

Kind regards

Carlo