Hi LAMMPS users,
I’ve been using LAMMPS on a cluster that employs SLURM Linux system, and there’s no problem in MPI parallel computing before a maintenance of the cluster (for software update). Now the job cannot be computed in parallel after the maintenance, like I assigned several nodes to the job, and the job was actually computed in 1 by 1 by 1 MPI processor grid, and produced duplicate outputs in the same file (see below).
working directory = /wrk/lnl5/vis_nemd/273K_125mol/2.0e-7
SLURM_SUBMIT_HOST = hercules.hpc.nist.gov
SLURM_JOBID=153461
SLURM_JOB_NODELIST=h[310-311]
SLURM_NNODES=2
SLURM_NTASKS=16
LAMMPS (12 Dec 2018)
using 2 OpenMP thread(s) per MPI task
LAMMPS (12 Dec 2018)
using 2 OpenMP thread(s) per MPI task
LAMMPS (12 Dec 2018)
using 2 OpenMP thread(s) per MPI task
LAMMPS (12 Dec 2018)
using 2 OpenMP thread(s) per MPI task
LAMMPS (12 Dec 2018)
using 2 OpenMP thread(s) per MPI task
LAMMPS (12 Dec 2018)
using 2 OpenMP thread(s) per MPI task
LAMMPS (12 Dec 2018)
using 2 OpenMP thread(s) per MPI task
LAMMPS (12 Dec 2018)
using 2 OpenMP thread(s) per MPI task
Reading data file …
Reading data file …
Reading data file …
Reading data file …
Reading data file …
Reading data file …
Reading data file …
Reading data file …
triclinic box = (15.043 15.043 15.043) to (64.957 64.957 64.957) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
triclinic box = (15.043 15.043 15.043) to (64.957 64.957 64.957) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
triclinic box = (15.043 15.043 15.043) to (64.957 64.957 64.957) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
triclinic box = (15.043 15.043 15.043) to (64.957 64.957 64.957) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
triclinic box = (15.043 15.043 15.043) to (64.957 64.957 64.957) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
triclinic box = (15.043 15.043 15.043) to (64.957 64.957 64.957) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
triclinic box = (15.043 15.043 15.043) to (64.957 64.957 64.957) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
triclinic box = (15.043 15.043 15.043) to (64.957 64.957 64.957) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms …
The LAMMPS version is Dec 3 2018. And I included modules of mpi/openmpi-x86_64 and intel for the compute. I also tried other mpi like mpich-x86_64, mpich2-x86_64, mvapich-x86_64, but still can’t work properly in parallel. Enclosed are the slurm output (in this case I only run for very few steps) and job submission scripts for your reference.
I’ve tried quite some possible solutions that I found from Internet but none of them work. I hope someone here can give me some suggestions.
Thanks,
Lingnan
slurm-153461.out (37.9 KB)
sbjob1.sh (531 Bytes)