PPPM crash

Hi all, I am seeing the following crash when I use the attached input files with today’s source on 5 processors only. The input deck successfully executes on 1, 2, 3, 4, and 16 processors. Any suggestions?
Jeremy

PPPM initialization …
G vector (1/distance)= 0.320324

grid = 5 5 5
stencil order = 5
estimated absolute RMS force accuracy = 0.000257856
estimated relative force accuracy = 1.79071e-05
using double precision FFTs
WARNING: Reducing PPPM order b/c stencil extends beyond neighbor processor (pppm.cpp:238)
G vector (1/distance)= 0.312743
grid = 6 6 6
stencil order = 4
estimated absolute RMS force accuracy = 0.000380268
estimated relative force accuracy = 2.64082e-05
using double precision FFTs
WARNING: Reducing PPPM order b/c stencil extends beyond neighbor processor (pppm.cpp:238)
G vector (1/distance)= 0.29689
grid = 8 8 8
stencil order = 3
estimated absolute RMS force accuracy = 0.000831857
estimated relative force accuracy = 5.77693e-05
using double precision FFTs
WARNING: Reducing PPPM order b/c stencil extends beyond neighbor processor (pppm.cpp:238)
G vector (1/distance)= 0.289979
grid = 24 24 24
stencil order = 2
estimated absolute RMS force accuracy = 0.00115531
estimated relative force accuracy = 8.02318e-05
using double precision FFTs
WARNING: Reducing PPPM order b/c stencil extends beyond neighbor processor (pppm.cpp:238)
G vector (1/distance)= 0.285961
grid = 1440 1440 1440
stencil order = 1
estimated absolute RMS force accuracy = 0.0013935
estimated relative force accuracy = 9.67734e-05
using double precision FFTs
brick FFT buffer size/proc = -1677802081 597196800 -1902130528

in.test (1.68 KB)

water.init (5.62 KB)

Hi all, I am seeing the following crash when I use the attached input files
with today's source on 5 processors only. The input deck successfully
executes on 1, 2, 3, 4, and 16 processors. Any suggestions?

the primary advice would be:
don't run on 5 processors, then.

the problem is, that with 5 processors,
you force the domain decomposition to
be in one direction only, and doesn't
agree well with your (tiny) system size
and cutoff, and pppm parameters.

for such a small system, it should
be faster to run plain ewald in any
case, and that doesn't suffer from
the stencil size limitations.

cheers,
    axel.

Hi Axel, I think the more general question is how can we easily tell that we have specified a partition that will cause for lammps? The context is that for developing within the code, it is useful to know if the error is caused by a bug in fresh code or foolishness in selection of the test problem.
Jeremy

Hi Axel, I think the more general question is how can we easily tell that we have specified a partition that will cause for lammps? The context is that for developing within the code, it is useful to know if the error is caused by a bug in fresh code or foolishness in selection of the test problem.

that is an almost impossible goal. the best example
is the recurring question of "lost atoms". in principle,
LAMMPS could be rewritten to more carefully monitor
what is going on and print a more meaningful and
helpful error message, i.e. do the diagnostics not
during the re-neighboring but during all steps that
could cause this problem.

unfortunately, that would make LAMMPS also
incredibly slow, since this is touching performance
critical code paths. one has to make a choice of
how much performance one is willing to sacrifice
for convenience and clarity of error handling.

also, it is not always easy to tell, which is the
proper way to deal with the problem.

with your specific input example the unfortunate
choice of settings is revealed by the (rather cryptic)
warning of:
Reducing PPPM order b/c stencil extends beyond neighbor processor (pppm.cpp:238)

but whether to switch to ewald summation, or
to use less processors, or to request tighter
pppm convergence, or to use a shorter coulomb
cutoff is the proper resolution is impossible
to tell without knowing more about the context.
what may be a foolish choice in one case,
can be the right one in another.

that being said, it would certainly be very
desirable to have a (large) choice of validation
inputs around that would be routinely used
to verify that features which are known to work
are not broken by new code. given the
vast number of possible permutations of
features and parameters in LAMMPS and
also considering the fact that certain
bugs only manifest themselves under
specific conditions and when running in
parallel with a particular processor
distribution, this is a daunting task.

this would be an ideal way for people
with little programming experience to
contribute to LAMMPS and repay
developers like you for your effort in
developing new features. sadly the
culture of helping developers of free
software as a way of thanking and
paying respect to their efforts has
died many, many years ago. :frowning:

axel.

I think that's actually a helpful error message.
It is telling you exactly what it is doing. It is
reducing the PPPM stencil (from the default of
5 grid points b/c it extends beyond the
nearest neighbor proc. It keeps reducing
it one stencil point at a time until it is too tiny.
Presumably b/c your problem is tiny but
the grid is also then huge. So something is clearly
wrong with using PPPM for this problem.

I'll ask Paul to look into why PPPM is choosing
such a huge grid and small stencil. Can you
send him (or post) the input files for this test?
And a script that is as vanilla as possible.
E.g. doesn't use ATC if possible, etc.

Steve

I think that's actually a helpful error message.
It is telling you exactly what it is doing. It is

it is helpful for people that know the specifics of
how PPPM is implemented parallelized parellelized.

reducing the PPPM stencil (from the default of
5 grid points b/c it extends beyond the
nearest neighbor proc. It keeps reducing
it one stencil point at a time until it is too tiny.
Presumably b/c your problem is tiny but
the grid is also then huge. So something is clearly
wrong with using PPPM for this problem.

I'll ask Paul to look into why PPPM is choosing
such a huge grid and small stencil. Can you
send him (or post) the input files for this test?
And a script that is as vanilla as possible.
E.g. doesn't use ATC if possible, etc.

a suitable test was attached to jeremy's very
first e-mail. i dug into the code a little bit and
i believe the issue is that the pppm order
must not be 1 but at least 2.

the "killer" seems to be this code segment
around line 265 in KSPACE/pppm.cpp

    // nlower,nupper = stencil size for mapping particles to PPPM grid

    nlower = -(order-1)/2;
    nupper = order/2;

this will lead to invalid grid stencil computation
because nlower == nupper for order == 1..

with the limit set to order 2, the problematic
input actually runs fine (only at higher than
requested pppm accuracy) and will run fine
for even higher processor counts per dimension.

i'll make a few more test and will send you
the improved code.

cheers,
    axel.

here is the pppm bugfix, including the
corresponding change in pppm/cuda.

enjoy,
     axel.

lammps-pppm-order-fix.tar.gz (28.1 KB)

Thanks Axel and Steve, this helps us out a lot. I apologize for some of these slightly weird problems we send you guys. This (and many others) are related to our benchmarking tests which we try to keep small for fast execution but in which we also want to fully exercise features as a user would. Feel free to point out the glaring contradiction.
Jeremy

jeremy.

Thanks Axel and Steve, this helps us out a lot. I apologize for some of these slightly weird problems we send you guys.

no need to apologize. these are good. a lot of these tests
reveal some problems that don't show up normally, but
need to be fixed. as i was stating before, this actually
would need to be done much more rigorously and in a
more automated fashion for more of the functionality of
LAMMPS. i've been trying to develop this to a workable
state multiple times with the help of undergraduate students
(the only "free" workforce available to me here), but so
far nothing really lead to something that is generic,
simple and maintainable enough, so that it could be
done in a more automated fashion and also as optional
step of the LAMMPS installation (as validation).

This (and many others) are related to our benchmarking tests which we try to keep small for fast execution but in which we also want to fully exercise features as a user would. Feel free to point out the glaring contradiction.

right now all i can do is to encourage you.

axel.