Dear all,
I am facing a problem on a cluster using SLURM, on which I am only able to launch up to 31 tasks per mpirun command. If I try to launch >=32 MPI ntasks I get the following error: sys.c:1560 UCX ERROR pthread_create() failed: Resource temporarily unavailable.
I am following the OpenMPI documentation recommendations and launch the jobs with mpirun instead of srun (10.7. Launching with Slurm — Open MPI 5.0.x documentation) and used the --mca pml ucx flag to ensure UCX is being used.
I checked that MPI is configured to be able to launch more than 31 tasks. I am attaching a file with the full output from the test (slurm.out (14.3 KB)). The same error persists when using srun instead of mpirun. The only difference is that srun also gives a warning before the program executes normally when using <32 tasks:
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
PMIx stopped checking at the first component that it did not find.
Host: <redacted>
Framework: psec
Component: munge
--------------------------------------------------------------------------
I tried disabling btl/uct as suggested in the UXC documentation (Running UCX — OpenUCX documentation) to no avail.
In order to understand better this problem I ceated a simple sbatch script run_sbatch.sh (283 Bytes) that runs the equally simple exec_srun.sh (58 Bytes). By alternating between the srun and mpirun commands in run_sbatch.sh (and respectively OMPI_COMM_WORLD_RANK and SLURM_PROCID in exec_srun.sh) I found that srun could deal normally with the maximum number of tasks per node (104) whereas mpirun would give an error:
--------------------------------------------------------------------------
A request was made to bind that would require binding
processes to more cpus than are available in your allocation:
Application: exec_srun.sh
#processes: 104
Mapping policy: BYCORE
Binding policy: CORE
You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------
My tests showed that with mpirun I could launch a maximum of 52 tasks, which probably means that MPI does not recognize Hyperthreading? All this is very confusing as it seems to go against the OpenMPI documentation that suggests using mpirun instead of srun with newer versions of OpenMPI. In my environment I have OpenMPI version 5.0.8 with UCX v1.20.0. (I had to use my own version of OpenMPI as the GNU compilers preinstalled in the cluster are v4.8.5 and do not support C++17).
In any case, I am trying to understand if this problem is specific to LAMMPS, as the simple program (that requires no communication between tasks) runs without problems with srun. I am sorry if I am bothering you with an issue unrelated to LAMMPS, I decided to write here after lots of back and forth with the admins of the cluster that brought no solutions.
Thank you in advance
Christos