best practices to launch specific workflows on specific compute resources

Chris · August 24, 2015, 1:43pm

Hello!

I am evaluating fireworks as a workflow tool for use on a hybrid environment of LSF and SGE clusters as well as serial/no-cluster workflows.

I do not see any functionality to launch specific workflows on specific resources. For instance the fireworks.queue.queue_launcher or fireworks.core.rocket_launcher doesn’t seem to have facility to say “launch this specific workflow [from the db|that i just created]”

Ideally, some workflows would go into the database and be picked up for cluster runs. Others added by a user could be run serially and a daemon could know not to touch them.

Does this exist, and I haven’t found it? Or is this use case not supported?

Other than that, it seems pretty feature rich, so we’d still consider contributing a patch or forking to allow this behaviour, if the code changes are relatively small.

Thanks!

CCH

Chris · August 24, 2015, 2:00pm

I guess one way to phrase this without reworking the code is to have a separate mongodb collection and/or db for serial workflows, or a process that creates one collection per user behind the scenes.

Am I thinking about that correctly? But it still seems like “run this workflow by id” is a desirable code feature, versus the rapidfire generic “run all ready workflows”.

Anubhav_Jain · August 24, 2015, 3:31pm

Hi,

There are many existing features to control where a job gets run, and to categorize different jobs for different workers in both simple and more complex ways. They are outlined in the second half of this tutorial:

https://pythonhosted.org/FireWorks/controlworker.html

Please take a look and let me know if that solves most if not all of your issue.

If you are looking to manually launch a specific FireWork, a few notes on that:

i) The “priority” will determine which jobs gets run first. If you modify the priority to be very high, it will be run next: https://pythonhosted.org/FireWorks/priority_tutorial.html

ii) The “rlaunch singleshot” command actually has a “–fw_id” option that lets you run a very particular FireWork.

iii) Note that the “qlaunch” command does not have a “–fw_id” option, but user gpetretto is working on a fork of FWS that has that feature along with a bunch of other things. He will be merging his code when ready; no due date but I expect in the next month or two.

Best,

Anubhav

···

On Mon, Aug 24, 2015 at 7:00 AM, Chris H [email protected] wrote:

I guess one way to phrase this without reworking the code is to have a separate mongodb collection and/or db for serial workflows, or a process that creates one collection per user behind the scenes.

Am I thinking about that correctly? But it still seems like “run this workflow by id” is a desirable code feature, versus the rapidfire generic “run all ready workflows”.

–

You received this message because you are subscribed to the Google Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/023455d4-ff3a-4c5e-a9c8-f05b03aff2da%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Chris · August 24, 2015, 6:27pm

I think my title was poorly phrased.

We have taken care of resource requests within the grid compute environment by adding -R to your supplied headers for qsub/bsub, and supplying pre-formatted resource strings along with the jobs. This seems to remove the need to create a static fireworks job within the queue system or name them, as the resource requests will find a node with the resources needed and execute it. Eventually we might go further and ask for memory/disk/arbitrary_resource as separate variables within the firework object and construct the strings for the user, but for now this seems to work.

I think we can use the system as-is with that slight tweak and the queue launcher in infinite rapid fire mode to launch production workflows. Simply add a workflow to the production collection and wait. Although as you say, a queue launcher that can relaunch a specifc workflow id (not just firework id, but workflow id) would be desirable in many circumstances. We can probably wait for the fork/patch to develop to see if we need to add more code on our own for our purposes.

Another use case, though, is to take advantage of Fireworks DAG parsing but run entirely on one node, either an interactive shall e.g. “bsub -Is bash” then “my_fireworks_worklow.py” and wait, or “bsub my_fireworks_workflow.py” that then runs all your jobs in serial but respects the workflow dependencies.

So for those, we specifically don’t want them ever to be picked up by the queuelauncher in infinite mode. From the documentation you sent, we could assign them a UUID as a category and then force the fireworker that comes up for serial processing to match that UUID, I guess.

Another piece of this puzzle I might have explained poorly is we want a production section and a user section, since we don’t ever want a user to be able to lpad reset production, and we probably don’t want user workflows to be picked up by the generic production rapidfire in infinite mode, all user workflows should be “Try once then report back” with “manual rerun” as an option. We’ll probably obscure the fireworks interface with helper code in order to accomplish this, and expose a light command line front end and give them choices like “run in serial” and “run on grid.”

So I think grabbing a users unix id and forcing them into a collection of their own is probably a good idea. But we still would want “queue launch this workflow id” and “serial launch this workflow id” and “retry this workflow id” for that to work in the long term, and the equivalent for the queue launcher.

I’ll keep playing around with it and check back in if needed as we firm up our own ideas or after the queue code updates. Thanks!

CCH