support SLURM with multiple clusters?

Hi,

I’m trying out fireworks with a cluster which uses the multi-cluster feature of the SLURM queuing system.

https://slurm.schedmd.com/multi_cluster.html

Though I can customize the script and to include the “clusters” option.

It seems that I have to tweak the CommonAdapter to work with this setup.

  1. When submitting jobs with the “-M” option, SLURM reports the job id like this,

sbatch -A snic2018-1-33 -M snowy test

Submitted batch job 257467 on cluster snowy

``

I think this makes CommonAdaptor fail to retrieve the job id:

https://github.com/materialsproject/fireworks/blob/d6ed387ca9c9b5deefb430c2d753e3753102ca86/fireworks/user_objects/queue_adapters/common_adapter.py#L78

  1. When querying job info with the “-M” option, a extra line is printed,

squeue -u yunqi -M snowy -h

CLUSTER: snowy

257471 core test yunqi R 0:05 1 s156

``

I think this will cause the adaptor to get the wrong number of jobs:

https://github.com/materialsproject/fireworks/blob/d6ed387ca9c9b5deefb430c2d753e3753102ca86/fireworks/user_objects/queue_adapters/common_adapter.py#L137

Will it be of interest to modify the CommonAdaptor to support multi-cluster?

/Yunqi

Hi Yunqi,

Although we don’t use multi-cluster ourselves, it would certainly be of interest to have this feature in FireWorks. If you were to make the appropriate code modifications, we’d be happy to integrate it into the main repo. From your email, it seems like it’s just a couple of changes.

Let me know if you’d like to do this, and if you need any help with the code design.

Best,

Anubhav

···

Best,
Anubhav