Getting qlaunch to submit to multiple queues

Hi!

Here’s the use case I’d like to have work. I have four partitions on a SLURM-managed compute cluster I can submit jobs to. Each of them have a different resource limit that are independent of each other (i.e. I can fill up my quota on each partition simultaneously with no problems). Of course, I could just submit to the one I have the largest allocation on (which is what I’m doing right now), but that’s only about half the total resources I could be using if I could submit to all four.

Is there a way to get the rapidfire qlauncher to somehow farm jobs out to a list of queues instead of just one? If I understand the logic used in queue_launcher.py, it seems like you could insert a iteration over qadapters right after the while True block at line 204 and have it work. Does that seem reasonable?

Thanks!

To do this all you need is to have different queue adapters.

So you have a resource and then have multiple queues/partitions that you want to utilize. Say you have ~/fireworks_config/ as the location of your queue adapter and fireworker files. What I would suggest is instead make a folder called “workers” or “partitions” inside of this folder. Then you can make a sub-folder for each partition, which contains a customized adapter for each partition. Thats a little bit of a mouthful so visually:

fireworks_config

db.json
my_fireworker.yaml
my_qadapter.yaml
launchpad.yaml
FW_config.yaml
workers

partition1

my_fireworker.yaml
my_qadapter.yaml
FW_config.yaml
partition2
my_fireworker.yaml
my_qadapter.yaml
FW_config.yaml

Now in each of those folders you put the specific queue information as well as the specific execution/commands you might need for a different partition inside the files. The FW config file in each would simple point to these directories. Now the last piece of info I like to do is add an environmental variable $partition = “-c /path/to/my/partition/FW_config”. So that now I can call “qlaunch $partition singleshot” to specifically submit it to one queue. If I am running many many calculations and I want to use rapid-fire, then I can simple call rapid-fire for each partition and submit, say 10 FWs to partition 1, 20 FWs to partition2, and so on. If I want to automatically have all of my partitions filled at all times with FWs until no more are left. I’d recommend using a crontab to periodically run rapid-fire to each partition.

Last thing, in the my_fworker.yaml, you can specific the “name” to be the partition name and the category to be the name of the whole cluster, so if you want to specifically submit to that partition, you can specify the fworker name in your execution options, but if you just want to submit to some partition on the cluster, you can specify that via the category option.

I’ve never given much thought as to whether or not this is the best way to do this, but its how I started doing it and I found it to work very well for me.

-Nick

···

On Thursday, September 12, 2019 at 1:30:04 PM UTC-7, Varchas Gopalaswamy wrote:

Hi!

Here’s the use case I’d like to have work. I have four partitions on a SLURM-managed compute cluster I can submit jobs to. Each of them have a different resource limit that are independent of each other (i.e. I can fill up my quota on each partition simultaneously with no problems). Of course, I could just submit to the one I have the largest allocation on (which is what I’m doing right now), but that’s only about half the total resources I could be using if I could submit to all four.

Is there a way to get the rapidfire qlauncher to somehow farm jobs out to a list of queues instead of just one? If I understand the logic used in queue_launcher.py, it seems like you could insert a iteration over qadapters right after the while True block at line 204 and have it work. Does that seem reasonable?

Thanks!

Interesting - so is the recommended execution model to use python scripts to create a massive set of Workflows and submit them to the MongoDB, and then use qlaunch in something like a crontab to periodically flush jobs out of the DB?

···

On Thursday, September 12, 2019 at 5:12:23 PM UTC-4, Nicholas Winner wrote:

To do this all you need is to have different queue adapters.

So you have a resource and then have multiple queues/partitions that you want to utilize. Say you have ~/fireworks_config/ as the location of your queue adapter and fireworker files. What I would suggest is instead make a folder called “workers” or “partitions” inside of this folder. Then you can make a sub-folder for each partition, which contains a customized adapter for each partition. Thats a little bit of a mouthful so visually:

fireworks_config

db.json
my_fireworker.yaml
my_qadapter.yaml
launchpad.yaml
FW_config.yaml
workers

partition1

my_fireworker.yaml
my_qadapter.yaml
FW_config.yaml
partition2
my_fireworker.yaml
my_qadapter.yaml
FW_config.yaml

Now in each of those folders you put the specific queue information as well as the specific execution/commands you might need for a different partition inside the files. The FW config file in each would simple point to these directories. Now the last piece of info I like to do is add an environmental variable $partition = “-c /path/to/my/partition/FW_config”. So that now I can call “qlaunch $partition singleshot” to specifically submit it to one queue. If I am running many many calculations and I want to use rapid-fire, then I can simple call rapid-fire for each partition and submit, say 10 FWs to partition 1, 20 FWs to partition2, and so on. If I want to automatically have all of my partitions filled at all times with FWs until no more are left. I’d recommend using a crontab to periodically run rapid-fire to each partition.

Last thing, in the my_fworker.yaml, you can specific the “name” to be the partition name and the category to be the name of the whole cluster, so if you want to specifically submit to that partition, you can specify the fworker name in your execution options, but if you just want to submit to some partition on the cluster, you can specify that via the category option.

I’ve never given much thought as to whether or not this is the best way to do this, but its how I started doing it and I found it to work very well for me.

-Nick

On Thursday, September 12, 2019 at 1:30:04 PM UTC-7, Varchas Gopalaswamy wrote:

Hi!

Here’s the use case I’d like to have work. I have four partitions on a SLURM-managed compute cluster I can submit jobs to. Each of them have a different resource limit that are independent of each other (i.e. I can fill up my quota on each partition simultaneously with no problems). Of course, I could just submit to the one I have the largest allocation on (which is what I’m doing right now), but that’s only about half the total resources I could be using if I could submit to all four.

Is there a way to get the rapidfire qlauncher to somehow farm jobs out to a list of queues instead of just one? If I understand the logic used in queue_launcher.py, it seems like you could insert a iteration over qadapters right after the while True block at line 204 and have it work. Does that seem reasonable?

Thanks!

I think that’s the way to do it if you’re running a large number of jobs. If you have hundreds of workflows consisting of thousands of fireworks, you need to be pretty hands off, so crontabs are good way to submit batches at a time without intervention. If you only have to run, say 10 workflows. You can get away with manually submitting jobs to the different partitions.

···

On Thursday, September 12, 2019 at 3:10:53 PM UTC-7, Varchas Gopalaswamy wrote:

Interesting - so is the recommended execution model to use python scripts to create a massive set of Workflows and submit them to the MongoDB, and then use qlaunch in something like a crontab to periodically flush jobs out of the DB?

On Thursday, September 12, 2019 at 5:12:23 PM UTC-4, Nicholas Winner wrote:

To do this all you need is to have different queue adapters.

So you have a resource and then have multiple queues/partitions that you want to utilize. Say you have ~/fireworks_config/ as the location of your queue adapter and fireworker files. What I would suggest is instead make a folder called “workers” or “partitions” inside of this folder. Then you can make a sub-folder for each partition, which contains a customized adapter for each partition. Thats a little bit of a mouthful so visually:

fireworks_config

db.json
my_fireworker.yaml
my_qadapter.yaml
launchpad.yaml
FW_config.yaml
workers

partition1

my_fireworker.yaml
my_qadapter.yaml
FW_config.yaml
partition2
my_fireworker.yaml
my_qadapter.yaml
FW_config.yaml

Now in each of those folders you put the specific queue information as well as the specific execution/commands you might need for a different partition inside the files. The FW config file in each would simple point to these directories. Now the last piece of info I like to do is add an environmental variable $partition = “-c /path/to/my/partition/FW_config”. So that now I can call “qlaunch $partition singleshot” to specifically submit it to one queue. If I am running many many calculations and I want to use rapid-fire, then I can simple call rapid-fire for each partition and submit, say 10 FWs to partition 1, 20 FWs to partition2, and so on. If I want to automatically have all of my partitions filled at all times with FWs until no more are left. I’d recommend using a crontab to periodically run rapid-fire to each partition.

Last thing, in the my_fworker.yaml, you can specific the “name” to be the partition name and the category to be the name of the whole cluster, so if you want to specifically submit to that partition, you can specify the fworker name in your execution options, but if you just want to submit to some partition on the cluster, you can specify that via the category option.

I’ve never given much thought as to whether or not this is the best way to do this, but its how I started doing it and I found it to work very well for me.

-Nick

On Thursday, September 12, 2019 at 1:30:04 PM UTC-7, Varchas Gopalaswamy wrote:

Hi!

Here’s the use case I’d like to have work. I have four partitions on a SLURM-managed compute cluster I can submit jobs to. Each of them have a different resource limit that are independent of each other (i.e. I can fill up my quota on each partition simultaneously with no problems). Of course, I could just submit to the one I have the largest allocation on (which is what I’m doing right now), but that’s only about half the total resources I could be using if I could submit to all four.

Is there a way to get the rapidfire qlauncher to somehow farm jobs out to a list of queues instead of just one? If I understand the logic used in queue_launcher.py, it seems like you could insert a iteration over qadapters right after the while True block at line 204 and have it work. Does that seem reasonable?

Thanks!

Also note that if you want, you can assign different workflows to different partitions:

https://materialsproject.github.io/fireworks/controlworker.html

See the “Controlling the Worker …” section.

···

On Thursday, September 12, 2019 at 3:14:33 PM UTC-7, Nicholas Winner wrote:

I think that’s the way to do it if you’re running a large number of jobs. If you have hundreds of workflows consisting of thousands of fireworks, you need to be pretty hands off, so crontabs are good way to submit batches at a time without intervention. If you only have to run, say 10 workflows. You can get away with manually submitting jobs to the different partitions.

On Thursday, September 12, 2019 at 3:10:53 PM UTC-7, Varchas Gopalaswamy wrote:

Interesting - so is the recommended execution model to use python scripts to create a massive set of Workflows and submit them to the MongoDB, and then use qlaunch in something like a crontab to periodically flush jobs out of the DB?

On Thursday, September 12, 2019 at 5:12:23 PM UTC-4, Nicholas Winner wrote:

To do this all you need is to have different queue adapters.

So you have a resource and then have multiple queues/partitions that you want to utilize. Say you have ~/fireworks_config/ as the location of your queue adapter and fireworker files. What I would suggest is instead make a folder called “workers” or “partitions” inside of this folder. Then you can make a sub-folder for each partition, which contains a customized adapter for each partition. Thats a little bit of a mouthful so visually:

fireworks_config

db.json
my_fireworker.yaml
my_qadapter.yaml
launchpad.yaml
FW_config.yaml
workers

partition1

my_fireworker.yaml
my_qadapter.yaml
FW_config.yaml
partition2
my_fireworker.yaml
my_qadapter.yaml
FW_config.yaml

Now in each of those folders you put the specific queue information as well as the specific execution/commands you might need for a different partition inside the files. The FW config file in each would simple point to these directories. Now the last piece of info I like to do is add an environmental variable $partition = “-c /path/to/my/partition/FW_config”. So that now I can call “qlaunch $partition singleshot” to specifically submit it to one queue. If I am running many many calculations and I want to use rapid-fire, then I can simple call rapid-fire for each partition and submit, say 10 FWs to partition 1, 20 FWs to partition2, and so on. If I want to automatically have all of my partitions filled at all times with FWs until no more are left. I’d recommend using a crontab to periodically run rapid-fire to each partition.

Last thing, in the my_fworker.yaml, you can specific the “name” to be the partition name and the category to be the name of the whole cluster, so if you want to specifically submit to that partition, you can specify the fworker name in your execution options, but if you just want to submit to some partition on the cluster, you can specify that via the category option.

I’ve never given much thought as to whether or not this is the best way to do this, but its how I started doing it and I found it to work very well for me.

-Nick

On Thursday, September 12, 2019 at 1:30:04 PM UTC-7, Varchas Gopalaswamy wrote:

Hi!

Here’s the use case I’d like to have work. I have four partitions on a SLURM-managed compute cluster I can submit jobs to. Each of them have a different resource limit that are independent of each other (i.e. I can fill up my quota on each partition simultaneously with no problems). Of course, I could just submit to the one I have the largest allocation on (which is what I’m doing right now), but that’s only about half the total resources I could be using if I could submit to all four.

Is there a way to get the rapidfire qlauncher to somehow farm jobs out to a list of queues instead of just one? If I understand the logic used in queue_launcher.py, it seems like you could insert a iteration over qadapters right after the while True block at line 204 and have it work. Does that seem reasonable?

Thanks!