With the help of Xiaohui Qu, we noticed one important followup to this conversation. The multi-launch script works by spawning multiple parallel workers (e.g., one on each core when parallelizing over a single processor). Each worker functions as normal, but sometimes this causes behavior that looks strange.
For example, it you have a diamond shaped workflow:
1 -> 2,3 -> 4
and you run multi-launch with num_jobs = 2, you would expect Fireworks 2 and 3 to run in parallel over the 2 workers. However, what happens in practice is:
2 parallel workers start in the beginning
Worker A starts Firework 1
Worker B sees nothing to run (since Firework 1 is not yet finished, and other Fireworks depend on Firework 1)
With the nlaunches setting set to 0 (default), Worker B quits since it sees no jobs available to run and nlaunches=0 means a worker stops when there is nothing left to run.
Now, only Worker A is left, and things do not run in parallel since Worker B has already quit.
The easiest fix is to set nlaunches to “infinity” or equal to a large number. In the future it might be nice to have other options, e.g. to have Worker B quit only if no jobs can be found for N minutes and not to immediately quit if there are no jobs (or better, if there is nothing left waiting to run within the constraints of Worker B).
I hope this helps address some of the issues we were seeing in a private conversation.
On Wednesday, March 30, 2016 at 10:52:28 PM UTC-7, ajain wrote:
If you are running on a single machine, typically you want to just use a single Worker unless you have different job types and want the same machine to handle two different categories of jobs differently. In your case, I think you want to keep a single Worker but use the multi-launch:
Let me know if it doesn’t solve your problem
On Wed, Mar 30, 2016 at 9:44 PM, Roman M [email protected] wrote:
Stumbled upon Fireworks today, and have a newbie question. Thanks in advance.
I have eight cores on my machine, my workflow has a diamond shape. I first split work into 1000 tasks, calculate 1000 in parallel and then join the results together. I don’t quite understand, where I specify the number of concurrent workers, I want to set it to 8.
If someone can point me to an example in python, that would be great.
You received this message because you are subscribed to the Google Groups “fireworkflows” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/fireworkflows.
To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/11f9fa81-6d95-4273-b1ab-e8661aa72cd8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.