I would like to enable recovery rerun (via spec._recovery) which seems to me to be only possible with the --task-level flag. It works for any number of tasks and any task failed when the Firework is a root node. When the firework has parents (see example below), the recovery works only when the first task has been completed and fails if the first task fails. I have found that the error below is because the Rocket.run does not write a checkpoint to the launch in that specific case.
Question 1: Is this a necessary constraint or a bug?
Question 2: Is is possible to enable recovery upon rerun for one-task fireworks? Or for many-task fireworks with failed first task? Because the directory of the first task already contains useable data that can shorten the next run of that task.
In [1]: from fireworks.fw_config import LAUNCHPAD_LOC
...: from fireworks import LaunchPad
...: from fireworks import Firework, Workflow
...: from fireworks.user_objects.firetasks.script_task import ScriptTask, PyTask
...: fw_1 = Firework([ScriptTask(script='echo Hello 1')])
...: fw_2 = Firework([PyTask(func='time.sleep', args=[20]), ScriptTask(script='echo Hello 2')])
...: wf = Workflow([fw_1, fw_2], links_dict={fw_1: fw_2})
...: lpad = LaunchPad.from_file(LAUNCHPAD_LOC)
...: lpad.add_wf(wf)
Out[1]: {-2: 1, -1: 2}
$ rlaunch rapidfire
Hello 1
^CInterrupted by signal 2
Traceback (most recent call last):
File "/mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/fireworks/core/rocket.py", line 261, in run
m_action = t.run_task(my_spec)
File "/mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/fireworks/user_objects/firetasks/script_task.py", line 187, in run_task
output = func(*args, **kwargs)
File "/mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/fireworks/scripts/rlaunch_run.py", line 45, in handle_interrupt
sys.exit(1)
SystemExit: 1
$ lpad rerun_fws --task-level --copy-data -i 1
Traceback (most recent call last):
File "/mnt/data/ubuntu/work/python-3.10.12/bin/lpad", line 7, in <module>
sys.exit(lpad())
File "/mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/fireworks/scripts/lpad_run.py", line 1578, in lpad
args.func(args)
File "/mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/fireworks/scripts/lpad_run.py", line 640, in rerun_fws
lp.rerun_fw(int(fw_id), recover_launch=l_id, recover_mode=args.recover_mode)
File "/mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/fireworks/core/launchpad.py", line 1725, in rerun_fw
recovery = self.get_recovery(fw_id, recover_launch)
File "/mnt/data/ubuntu/work/python-3.10.12/lib/python3.10/site-packages/fireworks/core/launchpad.py", line 1769, in get_recovery
recovery.update(_prev_dir=launch.launch_dir, _launch_id=launch.launch_id)
AttributeError: 'NoneType' object has no attribute 'update'