Queue rapidfire with WAITING fireworks

Hi, for some reason my email address appears to be rejected, so I’m writing with another one.

I’m not sure if Anubhav received my follow-up below. I wonder if there is a way of getting the same behaviour without patching the code.

Primer

Hi Anubhav, first of all my apologies for the multiple posts. None of them seemed to go through in the web interface. No idea why.

My workflow is indeed intended to run each firework only after the previous one has finished and yes, submitting jobs ahead of time is out of question.

For this reason I thought that calling queue_launcher.rapidfire with nlaunches=0 was the right thing to do. I expected that the function would wait for FW1 to complete, then submit FW2 to the queue, and so on, but that doesn’t happen.

Digging into the code, I saw that I could get the needed behaviour by writing LaunchPad.waiting_exists() and adding it to the ‘if’ condition at lines 251-253

if num_launched == nlaunches or
(timeout and (datetime.now() - start_time).total_seconds()

= timeout) or (nlaunches == 0 and not launchpad.run_exists(fworker)
and not launchpad.waiting_exists(fworker)):
break

This way, after launching FW1, rapidfire can go on “sleeping for {sleep_time} secs” and “Checking for Rockets to run” instead of returning and leaving FW2, FW3 and FW4 behind.

Even if my issue seems solved, most probably I misunderstood something, but I can’t see what right now.

Thanks for any guidance
Primer

···

On Sunday 18/02/2018 at 23.28, Primer wrote:

On Sunday 18/02/2018 at 00.18, Anubhav Jain wrote:
Hi Primer,

I imagine that these four FWs are dependent on one another, i.e., you definitely can’t run the second FW until the first FW completes. If that is not the case (i.e., the four FWs can be run simultaneously), then you should create four different workflows, each with one Firework, and that would solve the problem.

Given that the four FWs are dependent on one another, the queue launcher doesn’t allow you (in reservation mode) to submit jobs that are not READY to run. Let’s take the opposite case: let’s pretend the queue launcher let you submit all 4 queue jobs simultaneously.

Queue job 1 is given the green light to run from your queueing system and starts running FW1. However, before FW1 finishes, let’s say queue job 2 is also given the green light to run from your queueing system. Now queue job 2 has nothing to do - FW2 is not ready to run yet, it’s waiting for FW1 since it is dependent on it. What would happen is queue job 2 would wake up and just quit (technically, it could also be programmed to sit around waiting for FW1 to finish, but this would be wasting your CPU time). Similarly, if queue jobs 3 and 4 starting running before their associated FWs were ready to run, they would just quit.

To prevent this kind of situation, reservation mode (where each queue job can only run a specific FW) is restricted to not letting you submit jobs until the FW is actually ready to run.

Best,
Anubhav

On Fri, Feb 16, 2018 at 8:36 AM, Primer wrote:
Hi,
I have a workflow composed of four fireworks. I need to launch through a queue each firework independently, reserving it in order to override queue parameters and assign different resources to each launch.

So I tried queue_launcher.rapidfire with reserve=True and nlaunches=0, but only the first job gets submitted, then rapidfire returns.

It isn’t clear to me if this is the expected behaviour, i.e. only standalone fireworks can be submitted in this fashion, and not workflow pieces.

Looking at the code, I saw that this happens because the launching loop exits depending on (among other conditions) the current value of launchpad.run_exists(fworker). This method returns True if there are READY fireworks in the database. In my case, after the first launch I have one RESERVED, then RUNNING firework, while the remaining fireworks are correctly WAITING. Therefore no READY fireworks are found at that point: game over.

Using nlaunches=‘infinite’ is an option, but this way I have a neverending process which needs to be manually terminated.

For the time being I patched the code defining a LaunchPad.waiting_exists() function that queries WAITING fireworks, then adding it to the condition for exiting the loop in queue_launcher.rapidfire.

Am I missing something? Is there a proper, better way to proceed?

Thank you
Primer

Can you see if the code works if you change:

launchpad.run_exists(fworker)

to

launchpad.future_run_exists(fworker)

in the code you mentioned below (without adding waiting_exists)? If that works, I can make the change and release a new version. I’ll post updates here:

···

On Monday, March 12, 2018 at 3:41:55 AM UTC-7, Primer wrote:

Hi, for some reason my email address appears to be rejected, so I’m writing with another one.

I’m not sure if Anubhav received my follow-up below. I wonder if there is a way of getting the same behaviour without patching the code.

Primer

On Sunday 18/02/2018 at 23.28, Primer wrote:
Hi Anubhav, first of all my apologies for the multiple posts. None of them seemed to go through in the web interface. No idea why.

My workflow is indeed intended to run each firework only after the previous one has finished and yes, submitting jobs ahead of time is out of question.

For this reason I thought that calling queue_launcher.rapidfire with nlaunches=0 was the right thing to do. I expected that the function would wait for FW1 to complete, then submit FW2 to the queue, and so on, but that doesn’t happen.

Digging into the code, I saw that I could get the needed behaviour by writing LaunchPad.waiting_exists() and adding it to the ‘if’ condition at lines 251-253

if num_launched == nlaunches or
(timeout and (datetime.now() - start_time).total_seconds()

= timeout) or (nlaunches == 0 and not launchpad.run_exists(fworker)
and not launchpad.waiting_exists(fworker)):
break

This way, after launching FW1, rapidfire can go on “sleeping for {sleep_time} secs” and “Checking for Rockets to run” instead of returning and leaving FW2, FW3 and FW4 behind.

Even if my issue seems solved, most probably I misunderstood something, but I can’t see what right now.

Thanks for any guidance
Primer

On Sunday 18/02/2018 at 00.18, Anubhav Jain wrote:
Hi Primer,

I imagine that these four FWs are dependent on one another, i.e., you definitely can’t run the second FW until the first FW completes. If that is not the case (i.e., the four FWs can be run simultaneously), then you should create four different workflows, each with one Firework, and that would solve the problem.

Given that the four FWs are dependent on one another, the queue launcher doesn’t allow you (in reservation mode) to submit jobs that are not READY to run. Let’s take the opposite case: let’s pretend the queue launcher let you submit all 4 queue jobs simultaneously.

Queue job 1 is given the green light to run from your queueing system and starts running FW1. However, before FW1 finishes, let’s say queue job 2 is also given the green light to run from your queueing system. Now queue job 2 has nothing to do - FW2 is not ready to run yet, it’s waiting for FW1 since it is dependent on it. What would happen is queue job 2 would wake up and just quit (technically, it could also be programmed to sit around waiting for FW1 to finish, but this would be wasting your CPU time). Similarly, if queue jobs 3 and 4 starting running before their associated FWs were ready to run, they would just quit.

To prevent this kind of situation, reservation mode (where each queue job can only run a specific FW) is restricted to not letting you submit jobs until the FW is actually ready to run.

Best,
Anubhav

On Fri, Feb 16, 2018 at 8:36 AM, Primer wrote:
Hi,
I have a workflow composed of four fireworks. I need to launch through a queue each firework independently, reserving it in order to override queue parameters and assign different resources to each launch.

So I tried queue_launcher.rapidfire with reserve=True and nlaunches=0, but only the first job gets submitted, then rapidfire returns.

It isn’t clear to me if this is the expected behaviour, i.e. only standalone fireworks can be submitted in this fashion, and not workflow pieces.

Looking at the code, I saw that this happens because the launching loop exits depending on (among other conditions) the current value of launchpad.run_exists(fworker). This method returns True if there are READY fireworks in the database. In my case, after the first launch I have one RESERVED, then RUNNING firework, while the remaining fireworks are correctly WAITING. Therefore no READY fireworks are found at that point: game over.

Using nlaunches=‘infinite’ is an option, but this way I have a neverending process which needs to be manually terminated.

For the time being I patched the code defining a LaunchPad.waiting_exists() function that queries WAITING fireworks, then adding it to the condition for exiting the loop in queue_launcher.rapidfire.

Am I missing something? Is there a proper, better way to proceed?

Thank you
Primer

Yes, that looks exactly like what I needed, and up to now it works.

Thank you for pointing out the future_run_exists() function, which I didn’t notice.

Primer

···

On Friday, March 16, 2018 at 18:54:54 UTC+1, Anubhav Jain wrote:

Can you see if the code works if you change:

launchpad.run_exists(fworker)

to

launchpad.future_run_exists(fworker)

in the code you mentioned below (without adding waiting_exists)? If that works, I can make the change and release a new version. I’ll post updates here:

https://github.com/materialsproject/fireworks/issues/266

On Monday, March 12, 2018 at 3:41:55 AM UTC-7, Primer wrote:

Hi, for some reason my email address appears to be rejected, so I’m writing with another one.

I’m not sure if Anubhav received my follow-up below. I wonder if there is a way of getting the same behaviour without patching the code.

Primer

On Sunday 18/02/2018 at 23.28, Primer wrote:
Hi Anubhav, first of all my apologies for the multiple posts. None of them seemed to go through in the web interface. No idea why.

My workflow is indeed intended to run each firework only after the previous one has finished and yes, submitting jobs ahead of time is out of question.

For this reason I thought that calling queue_launcher.rapidfire with nlaunches=0 was the right thing to do. I expected that the function would wait for FW1 to complete, then submit FW2 to the queue, and so on, but that doesn’t happen.

Digging into the code, I saw that I could get the needed behaviour by writing LaunchPad.waiting_exists() and adding it to the ‘if’ condition at lines 251-253

if num_launched == nlaunches or
(timeout and (datetime.now() - start_time).total_seconds()

= timeout) or (nlaunches == 0 and not launchpad.run_exists(fworker)
and not launchpad.waiting_exists(fworker)):
break

This way, after launching FW1, rapidfire can go on “sleeping for {sleep_time} secs” and “Checking for Rockets to run” instead of returning and leaving FW2, FW3 and FW4 behind.

Even if my issue seems solved, most probably I misunderstood something, but I can’t see what right now.

Thanks for any guidance
Primer

On Sunday 18/02/2018 at 00.18, Anubhav Jain wrote:
Hi Primer,

I imagine that these four FWs are dependent on one another, i.e., you definitely can’t run the second FW until the first FW completes. If that is not the case (i.e., the four FWs can be run simultaneously), then you should create four different workflows, each with one Firework, and that would solve the problem.

Given that the four FWs are dependent on one another, the queue launcher doesn’t allow you (in reservation mode) to submit jobs that are not READY to run. Let’s take the opposite case: let’s pretend the queue launcher let you submit all 4 queue jobs simultaneously.

Queue job 1 is given the green light to run from your queueing system and starts running FW1. However, before FW1 finishes, let’s say queue job 2 is also given the green light to run from your queueing system. Now queue job 2 has nothing to do - FW2 is not ready to run yet, it’s waiting for FW1 since it is dependent on it. What would happen is queue job 2 would wake up and just quit (technically, it could also be programmed to sit around waiting for FW1 to finish, but this would be wasting your CPU time). Similarly, if queue jobs 3 and 4 starting running before their associated FWs were ready to run, they would just quit.

To prevent this kind of situation, reservation mode (where each queue job can only run a specific FW) is restricted to not letting you submit jobs until the FW is actually ready to run.

Best,
Anubhav

On Fri, Feb 16, 2018 at 8:36 AM, Primer wrote:
Hi,
I have a workflow composed of four fireworks. I need to launch through a queue each firework independently, reserving it in order to override queue parameters and assign different resources to each launch.

So I tried queue_launcher.rapidfire with reserve=True and nlaunches=0, but only the first job gets submitted, then rapidfire returns.

It isn’t clear to me if this is the expected behaviour, i.e. only standalone fireworks can be submitted in this fashion, and not workflow pieces.

Looking at the code, I saw that this happens because the launching loop exits depending on (among other conditions) the current value of launchpad.run_exists(fworker). This method returns True if there are READY fireworks in the database. In my case, after the first launch I have one RESERVED, then RUNNING firework, while the remaining fireworks are correctly WAITING. Therefore no READY fireworks are found at that point: game over.

Using nlaunches=‘infinite’ is an option, but this way I have a neverending process which needs to be manually terminated.

For the time being I patched the code defining a LaunchPad.waiting_exists() function that queries WAITING fireworks, then adding it to the condition for exiting the loop in queue_launcher.rapidfire.

Am I missing something? Is there a proper, better way to proceed?

Thank you
Primer