How to use EMC with SLURM

Wyatt_Kuehster · August 5, 2024, 7:20pm

How exactly do you get EMC to run on a remote HPC with SLURM?

Does EMC, in fact, support SLURM? In the documentation I only see LSF and PBS mentioned, but I recall hearing in a LAMMPS workshop that it works with SLURM out of the box.

I have an HPC configured with slurm, and I log onto it using a key file. I am trying to execute an EMC script which builds locally but runs the simulations on that HPC, however I am not sure how to accomplish that.
For example, I have set the QUEUE_ACCOUNT and QUEUE_USER variables, but how do I pass the key file? Is there anything else I need to do before executing a run?

Wyatt_Kuehster · August 8, 2024, 11:57am

So I have made some progress on this, but would still appreciate help from anyone who has run into this issue before or successfully done a simulation workflow in slurm.

I am now just trying to do the whole thing on the queuing system (although I would still love to know how to chain in the building locally).

It seems that EMC is not detecting that my system is slurm because “which bsub” and “which qsub” return things that aren’t “”. Due to the order of the if/elif statements in run.sh in the function system_queue(), it defaults to assuming LSF.
So I modified this function.

There is clearly a ${system} variable in run.sh, but it is not clear to me where exactly it is set. I ended up just force setting it to “slurm”
Then, the commands in slurm_run() are still written with qsub rather than sbatch which is strange… I’m not sure if this would normally work, but I was able to get the build stage to run on my queue by modifying part of this function to:
slurm_run() {
…

echo “sbatch ${options}” “${user[@]}”
“–export=ALL,nprocs=${n},command="${command}"”
“-e $(pwd)/${project}.e -o $(pwd)/${project}.o ${subscript}”

# Execute the sbatch command
eval sbatch ${options} ${user[@]}
–export=ALL,nprocs=${n},command=“"${command}"”
-e $(pwd)/${project}.e -o $(pwd)/${project}.o ${subscript}

sleep ${sleep};
}

I have not been able to successfully generate the run commands for lammps, however, as my lammps module is not called lmp_${HOST}, and editing the run_lammps() function in emc_setup.pl does not fix it.

veld · August 15, 2024, 2:08am

Hi Wyatt,

I apologize for SLURM not working correctly. As a caveat, I have to admit, that I do not have access to a system with SLURM. As you have found out, run.sh functions as a wrapper around how to submit to three queueing systems: LSF, PBS, and SLURM. It tries to guess what queueing system you have by checking what submission commands are available. It does this in function system_queue(), where run.sh checks if sub, qsub, or sbatch is in your path as to identify LSF, PBS, or SLURM respectively. This of course all hinges around you executing run.sh on the cluster on which you are submitting your jobs. A function slurm_run() is already present in the distributed run.sh, which used https://slurm.schedmd.com/rosetta.pdf to translate PBS into SLURM commands. I noticed – that in the current distribution – this function is calling qsub instead of sbatch. The correct definition of slurm_run() should be

slurm_run() {
  local options="--job-name ${project}";
  local command="-submit slurm $@";

  command="$(echo "${command}" | sed 's/"/\\"/g')";
  
  if [ "${starttime}" != "now" -a "${starttime}" != "" ]; then
    options="${options} --begin=$(set_time 3 ${starttime})"; fi;
  if [ "${walltime}" != "" ]; then
    options="--time=$(set_time 3 ${walltime})"; fi;
  if [ "${wait}" != "" ]; then
    options="${options} --dependency=afterany:$wait"; fi;
  if [ "${queue}" != "" -a "${queue}" != "default" ]; then
    options="${options} -p ${queue}"; fi;
  if [ "${account}" != "" -a "${account}" != "none" ]; then
    options="${options} --account=${account}"; fi;

  # note: add ALL to --export?

  echo "sbatch ${options}" "${user[@]}" \
    "-n ${nprocs}" \
    "--export=nprocs=${n},command=\"${command}\"" \
    "-e $(pwd)/${project}.e -o $(pwd)/${project}.o ${subscript}";
  eval sbatch ${options} ${user[@]} \
    -n ${nprocs} \
    --export=nprocs=${n},command="\"${command}\"" \
    -e $(pwd)/${project}.e -o $(pwd)/${project}.o ${subscript};
  sleep ${sleep};
}

I’m including a corrected version, to be placed in ${EMC_ROOT}/scripts:
run.sh (21.5 KB)

You could use run.sh directly as to submit a small test script to your system with which you check valid submission. This test script could – for instance – hold nothing more than

#!/bin/bash

echo "Hello world!";

I know that the above works for both LSF and PBS. Can you be more precise on what exactly does not seem to work for SLURM?

You could add a symbolic link to your systems LAMMPS executable as a work-around for the lmp_${HOST} issue. I use ${HOME}/bin for all such oddities. You could do the following

cd ${HOME}/bin
ln -s your/systems/lammps/lmp lmp_${HOST}

and add ${HOME}/bin to your path in your .bashrc, e.g. by adding

export PATH=${PATH}:${HOME}/bin

at the end of your .bashrc. Might I ask, what the LAMMPS executable is called on your system?