Queue submission of multiple workflows (nlaunches, num_launched, jobs_in_queue)

I have another question, somehow related to my last post, about using queue_launcher.rapidfire with nlaunches=0 and njobs_queue=1.

Say I have two workflows, WF1 and WF2, that must not overlap, i.e. WF2 has to wait for WF1 to complete.

Sometimes I need to launch WF2 before WF1 has finished submitting Firework jobs to the queue.

It looks like this cannot be done, because when I launch WF2 and execution reaches line 216

if njobs_queue and jobs_in_queue >= njobs_queue:

``

jobs_in_queue is 1 because WF1 is still submitting jobs; but then at line 251

if num_launched == nlaunches or \

``

num_launched is still 0 because WF2 has not yet been able to submit any job. This causes the function to return, skipping the entire rapidfire launch for WF2.

I solved the issue requiring nlaunches to be strictly positive when we check if it is equal to num_launched:

if nlaunches > 0 and num_launched == nlaunches or \

``

Again, is there anything I’m missing about how queue_launcher.rapidfire is meant to work?

Thanks
Primer

Hi

I am a little lost by the question but I think your overall suggestion is correct, i.e. the num_launched == nlaunches is only intended to take place if nlaunches > 0.

I made some code changes to the github repo:

and will probably release a new FWS version to PyPI next week (e.g., after also hearing back from you on the other issue). Let me know if you need one before then, it’s no problem to do a release if it helps.

Best,

Anubhav

···

On Monday, March 12, 2018 at 3:54:24 AM UTC-7, Primer wrote:

I have another question, somehow related to my last post, about using queue_launcher.rapidfire with nlaunches=0 and njobs_queue=1.

Say I have two workflows, WF1 and WF2, that must not overlap, i.e. WF2 has to wait for WF1 to complete.

Sometimes I need to launch WF2 before WF1 has finished submitting Firework jobs to the queue.

It looks like this cannot be done, because when I launch WF2 and execution reaches line 216

if njobs_queue and jobs_in_queue >= njobs_queue:

``

jobs_in_queue is 1 because WF1 is still submitting jobs; but then at line 251

if num_launched == nlaunches or \

``

num_launched is still 0 because WF2 has not yet been able to submit any job. This causes the function to return, skipping the entire rapidfire launch for WF2.

I solved the issue requiring nlaunches to be strictly positive when we check if it is equal to num_launched:

if nlaunches > 0 and num_launched == nlaunches or \

``

Again, is there anything I’m missing about how queue_launcher.rapidfire is meant to work?

Thanks
Primer

Now that I read my question again, I see that while trying to keep it short I made it too succinct.

Anyway, thanks for these changes as well. Everything seems working fine here.

No need for an urgent release, I downloaded the modified code from the repo.

Primer

···

On Friday, March 16, 2018 at 19:01:26 UTC+1, Anubhav Jain wrote:

Hi

I am a little lost by the question but I think your overall suggestion is correct, i.e. the num_launched == nlaunches is only intended to take place if nlaunches > 0.

I made some code changes to the github repo:

https://github.com/materialsproject/fireworks/commit/85a51de83a9d9d62504f705b8fe74c4989e900db

and will probably release a new FWS version to PyPI next week (e.g., after also hearing back from you on the other issue). Let me know if you need one before then, it’s no problem to do a release if it helps.

Best,

Anubhav

On Monday, March 12, 2018 at 3:54:24 AM UTC-7, Primer wrote:

I have another question, somehow related to my last post, about using queue_launcher.rapidfire with nlaunches=0 and njobs_queue=1.

Say I have two workflows, WF1 and WF2, that must not overlap, i.e. WF2 has to wait for WF1 to complete.

Sometimes I need to launch WF2 before WF1 has finished submitting Firework jobs to the queue.

It looks like this cannot be done, because when I launch WF2 and execution reaches line 216

if njobs_queue and jobs_in_queue >= njobs_queue:

``

jobs_in_queue is 1 because WF1 is still submitting jobs; but then at line 251

if num_launched == nlaunches or \

``

num_launched is still 0 because WF2 has not yet been able to submit any job. This causes the function to return, skipping the entire rapidfire launch for WF2.

I solved the issue requiring nlaunches to be strictly positive when we check if it is equal to num_launched:

if nlaunches > 0 and num_launched == nlaunches or \

``

Again, is there anything I’m missing about how queue_launcher.rapidfire is meant to work?

Thanks
Primer