Hi,
I’m trying to understand how to handle a firework (in this case a VASP calculation) that was killed due to walltime error. I use a simple code to create a yaml file for a Ba atom.
from fireworks import Firework
from fireworks_vasp.tasks import WriteVaspInputTask, VaspCustodianTask, VaspAnalyzeTask
from pymatgen.core.structure import Structure
from pymatgen.io.vaspio import Poscar
def create_fireworks(structure, keyVal, viset=‘MPVaspInputSet’, params={}, handlers=“all”, vasp_cmd=[“aprun”,"-n",“16”,"/path/to/vasp"]):
name = structure.formula
wf_name = name
t1 = WriteVaspInputTask(structure=structure, vasp_input_set=viset, input_set_params=params)
t2 = VaspCustodianTask(vasp_cmd=vasp_cmd, handlers=handlers)
t3 = VaspAnalyzeTask()
workflow = Firework([t1, t2, t3], name=name)
return workflow
if name == ‘main’:
inFileName = ‘POSCAR_Ba’
crystalStruc = Structure.from_file(inFileName)
keyVal = ‘Ba-Atom’
workflow = create_fireworks(crystalStruc,keyVal)
workflow.to_file(“VASP_Ba.yaml”)
print ‘Program Complete’
``
I added the generated *VASP_Ba.yaml* file to the MongoDB through lpad add ```VASP_Ba.yaml*.*
`The MongoDB in this case is in a remote host which is attached to a specific port of localhost through ssh.
When I ran this job for 5 mins, the job did not converge and the job got killed due to wall time. Somehow the custodian error handler did not get activated to create a soft stop. As a result the FW.json file (attached here) still shows a state of “RUNNING”. Now my questions is two fold.
- How to properly modify the status of the firework to ‘FIZZLED’
- How to restart the calculation from task2 (`VaspCustodianTask), by using the existing history in the launch directory (which would also require replacing POSCAR with CONTCAR).
I realize that I can achieve the second step by using append_wf of launchpad. But this requires creating a new firework with a new FW_ID. Is there a better way to do it say through --task-recovery of rerun?
`
FW.json (3.58 KB)