I am looking for a method to dynamically generate parallel fireworks as part of a larger serial workflow. The use case involves taking an input PDF and splitting it into its constituent pages, each of which gets passed into its own parallel firework for further per-page processing. After the completion of all parallel fireworks, I would like to run one final firework. Of course, the number of pages in an incoming PDF can only be determined at runtime, so I can’t hardcode the DAG into a Workflow object and then dynamically edit the links through FWActions. It seems like I would need a combination of the additions and detours kwargs to create this dynamic diamond workflow, but so far I haven’t been able to get anything working. Creating an initial DAG with two fireworks in series and then adding parallel fireworks as detours after the completion of the first firework not only breaks the expected parallelism of the detours (this was verified by going into mongo and finding that the parallel fireworks all end up linked with each other, essentially creating serial dependencies) but also causes the second firework to become a sibling of the newly added parallel fireworks, when I need guaranteed completion of the parallel fireworks before the final firework is triggered.
Thanks for your email and for reaching out. The strategy of having the two FWs (initial and final), and using “detours” to create the intermediate FWs should have worked as far as I understand your question.
I just wrote some basic code to test things out and confirmed the problem (FWS v0.96) that if there are multiple detours, that they end up having dependencies between them rather than being siblings of one another that can run in parallel. However, I did not get the problem that the second FW becomes a sibling of the detours; in my example, the second FW always executes last and is dependent on the initial FW and the detours.
I will try to correct the “detours” feature to work as intended and push a new release as soon as possible; thank you for identifying this bug.
Here’s the code I am using to test - let me know if it doesn’t represent the same situation as your example:
from fireworks.utilities.fw_utilities import explicit_serialize from fireworks.core.firework import FireTaskBase, FWAction, Workflow, Firework from fireworks.core.launchpad import LaunchPad from fireworks.core.rocket_launcher import launch_rocket, rapidfire from fireworks.user_objects.firetasks.script_task import ScriptTask @explicit_serialize class StartJob(FireTaskBase): def run_task(self, fw_spec): print 'Running the INITIAL FW' dt1 = Firework(ScriptTask.from_str('echo "this is intermediate job 1"' )) dt2 = Firework(ScriptTask.from_str('echo "this is intermediate job 2"' )) dt3 = Firework(ScriptTask.from_str('echo "this is intermediate job 3"' )) return FWAction(detours=[dt1, dt2, dt3]) @explicit_serialize class EndJob(FireTaskBase): def run_task(self, fw_spec): print 'Running the FINAL FW' if __name__ == '__main__' : lp = LaunchPad() # lp.reset('', require_password=False) fw1 = Firework([StartJob()]) fw2 = Firework([EndJob()], parents=[fw1]) wf = Workflow([fw1, fw2]) lp.add_wf(wf) rapidfire(lp)
That’s equivalent to my setup, the only difference being I have multiple instances of rapidfire(lp, nlaunches=-1) running to simulate independent workers. I also misspoke about the second FW becoming a sibling of the parallel FWs, as that occurred when I was experimenting with combinations of additions and detours; it’s expected that FWs added as additions and those added as detours would be siblings of each other.
Thanks again for the prompt response - looking forward to using Fireworks in its full capacity.
Ok, this issue should be fixed and released with a new test put in as of FW v0.97. Let us know if you still have issues with it afterward.
Preliminary usage after the update has been great. Will keep you posted if anything comes up. Thanks again for your help.