How to qlaunch (rapidfire-style) simultaneous collections of FireWorks

I have several hundred .yaml files that look like this:
name: mead

category:

query: ‘{}’

fws:

  • fw_id: 1

    spec:

    _tasks:

    • _fw_name: PyTask

      func: update_cubit.main

      stored_data_varname: “cubit_metadata”

kwargs: {“cubitfile”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/xuchen/sloped-box.py”, “var_names”: {“1”: “aperture2_width”, “0”: “aperture1_height”}, “errfile”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/xuchen/cubit.err”, “var_values”: " 0.4 0.6\n", “exofile”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/xuchen/results/000000_output/sloped-box.exo”}
.
.
.

  • fw_id: 22

spec:

_tasks:

  • _fw_name: PyTask

func: movedir.main

stored_data_varname: “movedir_metadata”

kwargs: {“targetdir”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/xuchen/results/000000_output/”, “FW”: 21, “filename”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/transition/paths.txt”}

links:

1:

  • 2

2:

  • 3

3:

  • 4

4:

  • 5

5:

  • 6

6:

  • 7

7:

  • 8

8:

  • 9

9:

  • 10

10:

  • 11

11:

  • 12

12:

  • 13

13:

  • 14

14:

  • 15

15:

  • 16

16:

  • 17

17:

  • 18

18:

  • 19

19:

  • 20

20:

  • 21

21:

  • 22

metadata: {}

``

I am trying to use qlaunch (with SLURM) to launch ~10 of these yaml tasks at a time. Every combination of qadapter.yaml and my 'lpad add’s and ‘qlaunch’ submissions (the latter two items being controlled by a python script) launch the first .yaml task (multi_task_0.yaml). Each of the fireworks in the multi_task_0.yaml file are executed sequentially, as they should be, but the fireworks in the following ‘multi_task_#.yaml’ get hung up behind those fireworks instead of being launched parallel. That is, I’m trying to run ‘multi_task_1.yaml’ through ‘multi_task_10.yaml’, simultaneously, with the rest of the ‘multi_task_#.yaml’ files queueing up behind them.

I’ve attached the fworker, launchpad, and qadapter .yaml files.

The python trying to launch the FWs currently looks like this:

for irun in range(int(nruns)):
command = “sed -i 's/task_[0-9]+/task_”+str(irun)+"/g’ qadapter.yaml"
os.system(command)

``

command = 'qlaunch -l my_launchpad.yaml -w my_fworker.yaml -q qadapter.yaml rapidfire --nlaunches infinite'
os.system(command)

``

``

Thanks in advance,

Dan

my_fworker.yaml (36 Bytes)

my_launchpad.yaml (102 Bytes)

qadapter.yaml (730 Bytes)

Hi Daniel

I am not sure what could be going wrong based on your description.

Let’s say you’ve executed “lpad add” to all 10 of your YAML files. Then, afterward you can try:

lpad get_fws -s READY -d count

This will count the number of Fireworks that are ready to run and that can potentially run in parallel. Immediately after adding 10 YAML files, that count should equal 10.

Can you confirm that it’s the case?

Also a few side notes:

  • You don’t need “category” or “query” keys in the YAML files for the Workflow as pasted in your message (first few lines). Those are only for my_fworker.yaml.

  • I am not sure exactly what your qlaunch script is trying to achieve. You should just be able to execute “qlaunch -l my_launchpad.yaml -w my_fworker.yaml -q qadapter.yaml rapidfire --nlaunches infinite” one time. That one execution should launch many jobs to your queue, limited only by (i) the number of READY fireworks above or (ii) the max number of jobs you want to retain in the queue (can be set with the -m parameter of qlaunch). The loop that is trying to modify the qadapter.yaml using sed to change task number I don’t understand. However, if you are trying to use different job names for different Fireworks, you might consider using reservation mode to qlaunch. There is extensive documentation about it, but the gist of it is that you add the “-r” option to qlaunch and that the name of the job script will be set to the name of the FW you are launching.

···

On Thursday, April 5, 2018 at 5:01:22 PM UTC-7, Daniel Shaw wrote:

I have several hundred .yaml files that look like this:
name: mead

category:

query: ‘{}’

fws:

  • fw_id: 1

spec:

_tasks:
- _fw_name: PyTask
  func: update_cubit.main
  stored_data_varname: "cubit_metadata"
  • fw_id: 22

spec:

_tasks:

  • _fw_name: PyTask

func: movedir.main

stored_data_varname: “movedir_metadata”

kwargs: {“targetdir”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/xuchen/results/000000_output/”, “FW”: 21, “filename”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/transition/paths.txt”}

links:

1:

  • 2

2:

  • 3

3:

  • 4

4:

  • 5

5:

  • 6

6:

  • 7

7:

  • 8

8:

  • 9

9:

  • 10

10:

  • 11

11:

  • 12

12:

  • 13

13:

  • 14

14:

  • 15

15:

  • 16

16:

  • 17

17:

  • 18

18:

  • 19

19:

  • 20

20:

  • 21

21:

  • 22

metadata: {}

kwargs: {“cubitfile”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/xuchen/sloped-box.py”, “var_names”: {“1”: “aperture2_width”, “0”: “aperture1_height”}, “errfile”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/xuchen/cubit.err”, “var_values”: " 0.4 0.6\n", “exofile”: “/home/Synclesis/Projects/STTR_A15A-T004/Sandboxes/dshaw/fireflow/xuchen/results/000000_output/sloped-box.exo”}
.
.
.

``

I am trying to use qlaunch (with SLURM) to launch ~10 of these yaml tasks at a time. Every combination of qadapter.yaml and my 'lpad add’s and ‘qlaunch’ submissions (the latter two items being controlled by a python script) launch the first .yaml task (multi_task_0.yaml). Each of the fireworks in the multi_task_0.yaml file are executed sequentially, as they should be, but the fireworks in the following ‘multi_task_#.yaml’ get hung up behind those fireworks instead of being launched parallel. That is, I’m trying to run ‘multi_task_1.yaml’ through ‘multi_task_10.yaml’, simultaneously, with the rest of the ‘multi_task_#.yaml’ files queueing up behind them.

I’ve attached the fworker, launchpad, and qadapter .yaml files.

The python trying to launch the FWs currently looks like this:

for irun in range(int(nruns)):
command = “sed -i 's/task_[0-9]+/task_”+str(irun)+"/g’ qadapter.yaml"
os.system(command)

``

command = 'qlaunch -l my_launchpad.yaml -w my_fworker.yaml -q qadapter.yaml rapidfire --nlaunches infinite'
os.system(command)

``

``

Thanks in advance,

Dan