[lammps-users] NaN on 8 processors, but not 4. MPICH-2 related?

We're running some fairly modest 25,000 atom Poiseuille flow simulations
on 8 processors and see it crash frequently with no LAMMPS-specific
error; just rank x in job y caused collective abort of all ranks.
What's peculiar is that I can start a simulation and have it run fine on
4 processors, but the same simulation will give NaN for the pressure at
t=0 if I use 8 processors. It doesn't matter if the 8 processors are on
the same compute node or different compute nodes. In addition, this
happens on all of our clusters, which use MPICH-2, but not on a cluster
at another site using the same code and input files, but using HP-MPI.

Has anyone seen this before? Can it be related to the MPI library?
What else can be causing it?


David R. Heine
Senior Research Scientist
Corning, Inc.
Corning, NY 14831

Tel: (607) 974-3760
Fax: (607) 974-3405