AJ,
Sorry to bother you again. One more quick question. We are using launch_multiprocess to run multiple fireworks in parallel. However, there is one issue which I will have to seek advice from you. Assume we have 16 fireworks in the workflow that need to be run. I am setting the launch_multiprocess parameters like this: nlaunches=1, num_jobs=16. This will create a total of 16 processes in there to sufficiently finish all my fireworks.
The issue is if somehow one firework failed in the middle, and I may have to rerun that firework by using lpad rerun_fws. If I do so, then yes, the firework will be rerun, but it takes one process away from my total 16 processes which is reserved for another firework. Therefore, one of my fireworks in the queue will end up not running due to the lack of an available process in the end.
If I set up parameters this way: nlaunches=1, num_jobs=20, then if everything finishes fine, I will have 4 more processes hanging in there forever, although the workflow already finished.
so what do you suggest? how should I run my jobs using launch_multiprocess? Will nlaunches=0, num_jobs=16 work? Thanks so much.
Wengang