Fireworkers continue running even when no ready FWs exist in database


I’ve noticed that my Fireworker jobs keep running even when there are no fireworks in the database which they could conceivably pull. I was wondering if this is intended behavior, or if there’s a way for me to modify the settings.

The details are as follows. I have several Relaxation-then-Static workflows in my database. When the workflow is “Ready”, the Relaxation fw is set to “Ready”, and the static fw is set to “watiing”.

On occasion, the relaxation firework will Fizzle, and the entire workflow will be marked as Fizzled. The static fw will still be marked as “waiting”, since its parent fw never finished successfully, and as a result it will never be run.

However, even when there are no “Ready” workflows and no “Ready” fireworks left in the database, as long as a “waiting” static FW exists in a fizzled workflow, my fireworkers will continue to check for new fireworks, up to their walltime limit. This often results in 6-12 hours of “wasted” walltime where the fireworker is just checking every 60 seconds for new jobs.

I was able to confirm that when I marked these unreachable static_fws as “paused”, the fireworkers would recognize that there are no more fireworks to pull, and automatically shutdown.

Ideally I would like it so that when a workflow is marked as fizzled, the fireworker ignores downstream ‘waiting’ fireworks as well. Short of modifying the workflow and the fireworks themselves, is there any way to modify fireworker behavior in this manner?



Based on what you said, I am assuming that you have qlaunch rapidfire inside of your qadapter yaml file.

This is intended function. It can be confusing because of the exact problem you describe. The reason it is the intended function is because the rapidfire function can be used either from command-line or from a background task/crontab script to passively check for new jobs as they become available. When you’re trying to run thousands of fireworks, you want them to launch and get ready in the queue as soon as they available.

When we start with atomate/fireworks, we often want to put rapidfire launch into the qadapter yaml so that we can do as much as possible in a single queue submission, but fireworks really works best in my opinion when you leave each queue submission as a singleshot execution, and do rapidfire from command line or a crontab in order to launch multiple smaller jobs.

That being said, you can customize to help with this. The command rapidfire takes several arguments that can help customize the execution, you can find these by typing qlaunch rapidfire --help. One that might be of interest to you is --timeout where you can tell FWs to stop looking for new jobs after a certain number of seconds.

Hope this quick description helps. Let me know if you need more details.

Thank you for your advice! I was able to find the rapidfire parameters easily based on your description.

After experimenting further with the parameters, I have settled on the following line in my qadapter.yaml file, which I will post in case it helps anyone else looking at this later:

rlaunch […] rapidfire --sleep 300 --max_loops 12

Where the queue submission now will check for rockets every 5 minutes instead of 1, and timeout after an hour. I believe this is the best balance for me between burning excess walltime and exiting too early to pick up recently added fireworks.

Thank you for your help!