Dear LAMMPS users,
Few days ago I posted a thread in which I was asking for some help since LAMMPS was not properly working on our cluster with SLURM. Our cluster is composed of various nodes, each node with 24 cores and 1 K80 GPU/node, ie 2 GPUS since 1 K80=2 K40.
Using a typical sbatch script to run LAMMPS on different nodes in parallel:
#!/bin/sh
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks=16
#SBATCH --partition=gpu
#SBATCH --gres=gpu:kepler:2
#SBATCH --time=01:00:00
module load mvapich2 cuda pythonlibs/3.5.2/scikit-learn/0.17.1
mpirun -np 16 lammps_30Jul2016 -sf gpu -pk gpu 2 -in in.input
exit 0
According to this script, I was expecting LAMMPS to run on 2 nodes, with a total of 16 cores and 2 GPUs/node, ie 4 GPUs in total.
However, according to the output, there was 8 procs per device running, which let me think that only 2 GPUs were used instead of 4.
After few headaches, with the help of my sysadm we figured out what happens.
Using the script as is, we found out that:
in node 0:
- 15 cores busy
- 8 processes in GPU0
- 8 processes in GPU1
in node 1: - 1 core busy
- GPUs are not used
That is, the work is heavily unbalanced between processors and GPUs. Moreover, the node has 24 cores, so there is still room for 1 more core in node 0, but the system prefers to use 1 core in node 1. Really not efficient.
However, if you comment the line SBATCH --ntasks=16 in the script above, then is a different story:
in node 0:
-
8 cores busy
-
4 processes in GPU0
-
4 processes in GPU1
in node 1: -
8 cores busy
-
4 processes in GPU0
-
4 processes in GPU1
which is what we expect. This time, it is well balanced and ALL the GPUs are used.
I hope this information can help those that are working on a similar system.
Have a nice week-end.
Christophe