LAMMPS, SLURM, mpirun and --ntasks

Christophe_Ortiz · September 16, 2016, 1:08pm

Dear LAMMPS users,

Few days ago I posted a thread in which I was asking for some help since LAMMPS was not properly working on our cluster with SLURM. Our cluster is composed of various nodes, each node with 24 cores and 1 K80 GPU/node, ie 2 GPUS since 1 K80=2 K40.
Using a typical sbatch script to run LAMMPS on different nodes in parallel:

#!/bin/sh

#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks=16
#SBATCH --partition=gpu
#SBATCH --gres=gpu:kepler:2
#SBATCH --time=01:00:00

module load mvapich2 cuda pythonlibs/3.5.2/scikit-learn/0.17.1

mpirun -np 16 lammps_30Jul2016 -sf gpu -pk gpu 2 -in in.input

exit 0

According to this script, I was expecting LAMMPS to run on 2 nodes, with a total of 16 cores and 2 GPUs/node, ie 4 GPUs in total.

However, according to the output, there was 8 procs per device running, which let me think that only 2 GPUs were used instead of 4.

After few headaches, with the help of my sysadm we figured out what happens.

Using the script as is, we found out that:
in node 0:

15 cores busy
8 processes in GPU0
8 processes in GPU1
in node 1:
1 core busy
GPUs are not used

That is, the work is heavily unbalanced between processors and GPUs. Moreover, the node has 24 cores, so there is still room for 1 more core in node 0, but the system prefers to use 1 core in node 1. Really not efficient.

However, if you comment the line SBATCH --ntasks=16 in the script above, then is a different story:

in node 0:

8 cores busy
4 processes in GPU0
4 processes in GPU1
in node 1:
8 cores busy
4 processes in GPU0
4 processes in GPU1

which is what we expect. This time, it is well balanced and ALL the GPUs are used.

I hope this information can help those that are working on a similar system.

Have a nice week-end.
Christophe