launch_multiprocess

AJ,

Sorry to bother you again. One more quick question. We are using launch_multiprocess to run multiple fireworks in parallel. However, there is one issue which I will have to seek advice from you. Assume we have 16 fireworks in the workflow that need to be run. I am setting the launch_multiprocess parameters like this: nlaunches=1, num_jobs=16. This will create a total of 16 processes in there to sufficiently finish all my fireworks.

The issue is if somehow one firework failed in the middle, and I may have to rerun that firework by using lpad rerun_fws. If I do so, then yes, the firework will be rerun, but it takes one process away from my total 16 processes which is reserved for another firework. Therefore, one of my fireworks in the queue will end up not running due to the lack of an available process in the end.

If I set up parameters this way: nlaunches=1, num_jobs=20, then if everything finishes fine, I will have 4 more processes hanging in there forever, although the workflow already finished.

so what do you suggest? how should I run my jobs using launch_multiprocess? Will nlaunches=0, num_jobs=16 work? Thanks so much.

Wengang

Hi Wengang,

I think nlaunches=0, num_jobs=16 should work. Then each process should keep pulling jobs until the workflow is completed. The only problem is if you don’t execute the “rerun” before the other jobs finish. Once there are no more ready jobs (i.e., if the fireworks are all COMPLETED or FIZZLED), even nlaunches=0 will quit. So you need to rerun before everything is completed.

If you set nlaunches=-1, the process will keep going infinitely so you can rerun whenever you want.

Alternatively, you can write your workflow to catch errors using the “_allow_fizzled_parents” combined with dynamic workflow programming:

https://pythonhosted.org/FireWorks/failures_tutorial.html

https://pythonhosted.org/FireWorks/reference.html

The _allow_fizzled_parents option isn’t documented so well, but if you put it inside a child FW, that FW will run regardless of whether the parent completed successfully or not. That child FW can then use dynamic FWActions to create more jobs as needed.

Best,

Anubhav

···

On Mon, Jul 20, 2015 at 9:31 AM, [email protected] wrote:

AJ,

Sorry to bother you again. One more quick question. We are using launch_multiprocess to run multiple fireworks in parallel. However, there is one issue which I will have to seek advice from you. Assume we have 16 fireworks in the workflow that need to be run. I am setting the launch_multiprocess parameters like this: nlaunches=1, num_jobs=16. This will create a total of 16 processes in there to sufficiently finish all my fireworks.

The issue is if somehow one firework failed in the middle, and I may have to rerun that firework by using lpad rerun_fws. If I do so, then yes, the firework will be rerun, but it takes one process away from my total 16 processes which is reserved for another firework. Therefore, one of my fireworks in the queue will end up not running due to the lack of an available process in the end.

If I set up parameters this way: nlaunches=1, num_jobs=20, then if everything finishes fine, I will have 4 more processes hanging in there forever, although the workflow already finished.

so what do you suggest? how should I run my jobs using launch_multiprocess? Will nlaunches=0, num_jobs=16 work? Thanks so much.

Wengang

You received this message because you are subscribed to the Google Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/d635b307-4e5b-447c-ba9d-7579efe04503%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.