Queue rapidfire doesn't realize it can run jobs

mjohnson541 · July 11, 2022, 9:28pm

I’m running a test example for submitting to a slurm queue:

from fireworks.queue.queue_launcher import rapidfire as rapidfirequeue
from fireworks.queue.queue_launcher import launch_rocket_to_queue
from fireworks.core.fworker import FWorker
import fireworks.fw_config
from fireworks.utilities.fw_serializers import load_object_from_file
from fireworks import Firework, Workflow, LaunchPad, ScriptTask
qadapter = load_object_from_file(fireworks.fw_config.QUEUEADAPTER_LOC)
launchpad = LaunchPad()
launchpad.reset(’’, require_password=False)
fw1 = Firework(ScriptTask.from_str(‘echo “hello” >> hello.txt’))
fw2 = Firework(ScriptTask.from_str(‘echo “goodbye” >> goodbye.txt’))
wf = Workflow([fw1,fw2], name=“test workflow”)
launchpad.add_wf(wf)
rapidfirequeue(launchpad,FWorker(),qadapter)

my qadapter file looks like:
_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w /home/mattsj/my_fworker.yaml -l /home/mattsj/my_launchpad.yaml singleshot
ntasks: 1
cpus_per_task: 8
walltime: ‘5-00:00:00’
queue: month-long-cpu
account: mattsj
job_name: null
logdir: /home/mattsj/fw_logs
pre_rocket: null
post_rocket: null

fworker file is an exact copy of the tutorial example

I’m on version 2.0.3 from conda-forge.

When I run the above python script in the FW_job.out file in the output files I get:
No FireWorks are ready to run and match query! {’$or’: [{‘spec._fworker’: {’$exists’: False}}, {‘spec._fworker’: None}, {‘spec._fworker’: ‘my first fireworker’}]}

Dumping the fireworks afterwards to dictionaries I get:

{‘spec’: {’_tasks’: [{‘script’: [‘echo “hello” >> hello.txt’],
‘use_shell’: True,
‘_fw_name’: ‘ScriptTask’}]},
‘fw_id’: 2,
‘created_on’: ‘2022-07-11T21:19:46.133113’,
‘updated_on’: ‘2022-07-11T21:19:46.138695’,
‘state’: ‘READY’,
‘name’: ‘Unnamed FW’}

and
{‘spec’: {’_tasks’: [{‘script’: [‘echo “goodbye” >> goodbye.txt’],
‘use_shell’: True,
‘_fw_name’: ‘ScriptTask’}]},
‘fw_id’: 1,
‘created_on’: ‘2022-07-11T21:19:46.133171’,
‘updated_on’: ‘2022-07-11T21:19:46.138699’,
‘state’: ‘READY’,
‘name’: ‘Unnamed FW’}

So it seems to not be running them because it thinks the current FWorker isn’t suitable.

mjohnson541 · July 13, 2022, 3:48am

This seems to have been an issue with configuration. When I load the my_launchpad.yaml, my_fworker.yaml and my_qadapter.yaml files manually and use those when calling rapidfire this issue went away.

fraricci · July 13, 2022, 4:10pm

Hi @mjohnson541.

It might be that you need to specify the my_fworker.yaml file when you create the FWorker obj ?
As you do in similar way for the launchpad and qadapter. In other words, it might be that when you create FWorker() it does not find your config file (supposedly /home/mattsj/my_fworker.yaml)

Not sure though, because I usually run the qlaunch via command line and not via python.

mdigennaro · January 21, 2023, 3:55pm

Dear all,
I am in a similar situation:
I have previously submitted a number of WF with qlaunhch rapidfire and they were running fine

After a while I had to pause some of them, and since then (over a few hours now) fireworks is still only running 5 WF in total, while there are still 16 slurm job running.

I have tried to use lpad.set_priority() but this is to modify the priority inside a WF, while I want to push a specific FW to start.

Do you have any suggestion?
Thanks
Marco

fraricci · January 25, 2023, 8:04pm

Hi Marco,
do you still have qlaunhch rapidfire running?

mdigennaro · January 26, 2023, 11:35am

No, it exided on its own after 60 secs sleeping

fraricci · January 27, 2023, 5:23pm

I’m not sure why this is happening. It needs more investigation.
I would try to rerun one of the READY FW to see if it can be seen by the qlaunch again.