Fizzled firework cannot rerun

Hi there,

I have a firework with fw_id of 14980 fizzled. I can run lpad get_fws -i 14980 and successfully get its status:

{
    "fw_id": 14980,
    "created_on": "2024-02-10T05:05:40.673328",
    "updated_on": "2024-02-27T20:45:43.873777",
    "state": "FIZZLED",
    "name": "store_inputs"
}

However, when I try to lpad rerun_fws -i 14980, it failed with the error:

Traceback (most recent call last):
  File "/dartfs-hpc/rc/home/x/anaconda3/envs/ht_defects/bin/lpad", line 8, in <module>
    sys.exit(lpad())
  File "/dartfs-hpc/rc/home/x/anaconda3/envs/ht_defects/lib/python3.10/site-packages/fireworks/scripts/lpad_run.py", line 1551, in lpad
    args.func(args)
  File "/dartfs-hpc/rc/home/x/anaconda3/envs/ht_defects/lib/python3.10/site-packages/fireworks/scripts/lpad_run.py", line 641, in rerun_fws
    lp.rerun_fw(int(f), recover_launch=l, recover_mode=args.recover_mode)
  File "/dartfs-hpc/rc/home/x/anaconda3/envs/ht_defects/lib/python3.10/site-packages/fireworks/core/launchpad.py", line 1714, in rerun_fw
    with WFLock(self, fw_id):
  File "/dartfs-hpc/rc/home/x/anaconda3/envs/ht_defects/lib/python3.10/site-packages/fireworks/core/launchpad.py", line 129, in __enter__
    raise ValueError(f"Could not find workflow in database: {self.fw_id}")
ValueError: Could not find workflow in database: 14980

Does any have a clue of this? Thanks in advance!

Hi @Zhenkun_Yuan, you might have a lock on your firework that is preventing it from being rerun. Try running:

lpad admin unlock -i 14980

before trying to rerun the firework again.

Thanks a lot! @Aaron_Kaplan, though still get the same error Hmm

Is this firework the only one in the workflow? If there are other fireworks in the flow, or they fizzled, or are stuck (run lpad detect_lostruns to check), there might be other issues at hand

Are you able to find the workflow corresponding to the FW?

lpad get_wfs -i 14980

@Anubhav_Jain @Aaron_Kaplan Thanks for helping. It looks the workflow associated with this fw_id has been lost, as I ran lpad get_wfs -i 14980 or other fw_id, the return is [] (i.e., empty). I checked other workflows, they are no problems.

The lpad detect_lostruns ran into TypeError: 'NoneType' object is not subscriptable

Since the calculations are not heavy, I will just restart the whole workflow.