Hi Anubhav,
Thanks a lot for the help. Here’s the info:
- Can you paste the output of “lpad get_fws -s READY -d count” after the script crashes?
I’ve tried this after two crashes now. The first was ‘1’ and the second ‘2’.
- Would you mind running the script again with strm_lvl=“DEBUG” and pasting the output again?
Here is the output, I’ve included a successful submission as well:
2017-02-08 10:53:34,428 INFO Job submission was successful and job_id is 1176878
2017-02-08 10:53:34,428 INFO Sleeping for 5 seconds…zzz…
2017-02-08 10:53:39,455 INFO Finished a round of launches, sleeping for 60 secs
2017-02-08 10:54:39,516 INFO Checking for Rockets to run…
2017-02-08 10:54:39,555 INFO The number of jobs currently in the queue is: 0
2017-02-08 10:54:39,555 INFO 0 jobs in queue. Maximum allowed by user: 20
2017-02-08 10:54:39,640 INFO Launching a rocket!
2017-02-08 10:54:39,647 DEBUG getting queue adapter
2017-02-08 10:54:39,733 INFO Created new dir /atlas/u/jkuck/rbpf_fireworks/block_2017-02-08-17-35-21-007249/launcher_2017-02-08-18-54-39-731710
2017-02-08 10:54:39,733 INFO moving to launch_dir /atlas/u/jkuck/rbpf_fireworks/block_2017-02-08-17-35-21-007249/launcher_2017-02-08-18-54-39-731710
2017-02-08 10:54:39,734 DEBUG writing queue script
2017-02-08 10:54:39,740 INFO submitting queue script
2017-02-08 10:54:41,842 INFO Job submission was successful and job_id is 1176879
2017-02-08 10:54:41,843 INFO Sleeping for 5 seconds…zzz…
2017-02-08 10:54:46,933 INFO Launching a rocket!
2017-02-08 10:54:46,940 DEBUG getting queue adapter
2017-02-08 10:54:46,961 INFO No jobs exist in the LaunchPad for submission to queue!
2017-02-08 10:54:46,961 ERROR ----|vvv|----
2017-02-08 10:54:46,962 ERROR Error with queue launcher rapid fire!
2017-02-08 10:54:46,965 ERROR Traceback (most recent call last):
File “/atlas/u/jkuck/software/anaconda2/envs/anaconda_venv/lib/python2.7/site-packages/fireworks/queue/queue_launcher.py”, line 216, in rapidfire
raise RuntimeError("Launch unsuccessful!")
RuntimeError: Launch unsuccessful!
2017-02-08 10:54:46,965 ERROR ----|^^^|----
Best,
Jonathan
···
On Wednesday, February 8, 2017 at 9:56:07 AM UTC-8, Anubhav Jain wrote:
Hi Jonathan
Two things:
- Can you paste the output of “lpad get_fws -s READY -d count” after the script crashes?
- Would you mind running the script again with strm_lvl=“DEBUG” and pasting the output again?
I haven’t seen or heard of this error before so it might take a little back and forth to figure out what’s happening.
Best,
Anubhav
On Tuesday, February 7, 2017 at 11:58:32 PM UTC-8, jkuck wrote:
Yes, the queue launcher crashes again after being restarted. I’m calling the queue launcher with fill_mode=false:
rapidfire(launchpad, FWorker(), qadapter, launch_dir=’.’, nlaunches=‘infinite’, njobs_queue=20,
njobs_block=500, sleep_time=None, reserve=False, strm_lvl=‘INFO’, timeout=None,
fill_mode=False)
Thanks,
Jonathan
On Tuesday, February 7, 2017 at 11:51:50 PM UTC-8, Joseph Montoya wrote:
Just to get a bit more info, does the issue persist when you restart the queue launcher? Also, are you using fill mode?
Best,
Joey
On Feb 7, 2017, at 11:30 PM, jkuck [email protected] wrote:
Hi,
I’m trying to run a long workflow that dynamically creates new fireworks at every iteration. I’m running the workflow with a queue launcher in infinite mode. Usually after around 5 iterations (50-100 fireworks) the queue launcher crashes as follows:
2017-02-07 22:56:21,500 INFO Sleeping for 5 seconds…zzz…
2017-02-07 22:56:26,592 INFO Launching a rocket!
2017-02-07 22:56:26,616 INFO No jobs exist in the LaunchPad for submission to queue!
2017-02-07 22:56:26,616 ERROR ----|vvv|----
2017-02-07 22:56:26,616 ERROR Error with queue launcher rapid fire!
2017-02-07 22:56:26,618 ERROR Traceback (most recent call last):
File “/atlas/u/jkuck/software/anaconda2/envs/anaconda_venv/lib/python2.7/site-packages/fireworks/queue/queue_launcher.py”, line 216, in rapidfire
raise RuntimeError("Launch unsuccessful!")
RuntimeError: Launch unsuccessful!
2017-02-07 22:56:26,619 ERROR ----|^^^|----
It looks like the queue launcher thinks a firework is ready to launch, but then finds the queue is empty after calling launch_rocket_to_queue(). Any tips would be appreciated!
Thanks,
Jonathan
–
You received this message because you are subscribed to the Google Groups “fireworkflows” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/fireworkflows.
To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/848dd390-ba00-4ad9-8daf-815882c89347%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.