[lammps-users] Using MPICH

Dear lammps users,

I have a problem when I launch a simulation using MPICH on 4 machines (24 processors).
The simulation includes carbon atoms (airebo, 5 angstrems sc lattice in gas phase) and argon (5 A sc lattice, also gas, lj potential).
If I start a simulation on 4 machines, it writes an error:

[[email protected]…1234… methane.06.03.2009]$ /usr/local/mpich2-1.0.7ver-sock/bin/mpiexec -n 32 /home/grid2/lmp_g++_poems < in.synthesys
LAMMPS (21 May 2008)
Lattice spacing in x,y,z = 5 5 5
Created orthogonal box = (0 0 0) to (105 110 110)
2 by 4 by 4 processor grid
Created 1452 atoms
Created 2904 atoms
Created 1452 atoms
Created 2420 atoms
Created 1936 atoms
2904 atoms in group carbon
7260 atoms in group argon
Setting up run …
rank 23 in job 9 w7.gridzone.ru_46939 caused collective abort of all ranks
exit status of rank 23: killed by signal 11
rank 15 in job 9 w7.gridzone.ru_46939 caused collective abort of all ranks
exit status of rank 15: killed by signal 11
rank 12 in job 9 w7.gridzone.ru_46939 caused collective abort of all ranks
exit status of rank 12: killed by signal 11
rank 2 in job 9 w7.gridzone.ru_46939 caused collective abort of all ranks
exit status of rank 2: killed by signal 11
rank 0 in job 9 w7.gridzone.ru_46939 caused collective abort of all ranks
exit status of rank 0: killed by signal 9

But if I start the same simulation on one machine (8 processors) everything is OK and the simulation works correctly.

Also if I change lattice from 5 A to 10 A, all works at 4 machines too.

Could you say something about this?

Thanks in advance

Konstantin

Can you run anything else on 24 procs of your box, e.g. an MPICH test?
If so, then post a small version of your input script and data file,
and I'll try it out.

Steve

I ran both your small and large scripts (for 100 steps, not 150,000)
w/out problem on my box on 1 and 24 procs, using MPICH.

I am using the most current (fully patched) version of LAMMPS.
If you are as well, then I think you are having an MPI or machine problem,
not a LAMMPS problem.

Steve