[lammps-users] LAMMPS code gets killed by error signal 9

NANDU_GOPAN · November 20, 2010, 6:17am

Hello all,

I have compiled LAMMPS (31 Oct 2010 version) on both my workstation and our institutes cluster system. Both are compiled with the same FFTW2 package. I have tried running a small code for the equilibration of 906 water molecules. The initial structure was created in three steps. First, the .pdb file was created using PACKMOL software (with 2 Angstrom tolerance between molecules). Then, using this .pdb file, a .psf file was created using vmd and finally the structure file for lammps (waternew.txt) was created using the topotools package.

The code runs well in the workstation but gets killed in the cluster (with the error:caused collective abort of all ranks exit status of rank 0: killed by signal 9). Could anyone point out as to what is going wrong?Could it be because of the high initial energy? I am attaching the code and the first few lines of the log files from both the workstation and the cluster herewith.

Thanks,
Nandu Gopan

waternew.txt (197 KB)

in.waterbulk (2.46 KB)

log.workstation (3.85 KB)

log.cluster (1.34 KB)

akohlmey · November 20, 2010, 5:46pm

Hello all,

I have compiled LAMMPS (31 Oct 2010 version) on both my workstation and our
institutes cluster system. Both are compiled with the same FFTW2 package. I
have tried running a small code for the equilibration of 906 water
molecules. The initial structure was created in three steps. First, the .pdb
file was created using PACKMOL software (with 2 Angstrom tolerance between
molecules). Then, using this .pdb file, a .psf file was created using vmd
and finally the structure file for lammps (waternew.txt) was created using
the topotools package.

The code runs well in the workstation but gets killed in the cluster (with
the error:caused collective abort of all ranks exit status of rank 0:

it doesn't get killed when running in parallel for me.

i suggest you try running some of the example or benchmark
inputs on your cluster and see if they work or fail.
if they fail, you should have the cluster checked out.
if they work, you have a problem.

axel.

NANDU_GOPAN · November 21, 2010, 8:20am

Thanks Axel. I think its an issue with FFTW2 on the cluster.

Regards,
Nandu Gopan,