Qlaunch rapidfire with reservation and maxjobs_queue on cori keeps launching jobs

Hello,

I have only recently ventured into the world of launchpads and rockets, so not sure if I am doing something wrong or this is a bug.

I am trying to run many simple structure optimization workflows using qlaunch -r rapidfire -m N to specify the max number of jobs to keep in the queue. This works as expected in our SGE cluster. But when I try to run this on NERSC Cori, qlaunch will properly detect when max jobs is reached and sleep for 60s, but upon returning for some reason it detects 0 jobs in the queue and so launches another round of N jobs (and so on until a job for all workflows that are available in the launchpad are submitted).

For example here’s an illustration with -m 2:

lbluque@cori01:/global/cscratch1/sd/lbluque/crfew> qlaunch -r rapidfire -m 2
2020-09-11 08:56:21,274 INFO getting queue adapter
2020-09-11 08:56:21,275 INFO Found previous block, using /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509
2020-09-11 08:56:21,976 INFO The number of jobs currently in the queue is: 0
2020-09-11 08:56:21,976 INFO 0 jobs in queue. Maximum allowed by user: 2
2020-09-11 08:56:23,814 DEBUG FW with id: 2478 is unique!
2020-09-11 08:56:23,814 INFO Launching a rocket!
2020-09-11 08:56:23,824 DEBUG FW with id: 2478 is unique!
2020-09-11 08:56:23,832 DEBUG FW with id: 2478 is unique!
2020-09-11 08:56:23,835 DEBUG Created/updated Launch with launch_id: 2846
2020-09-11 08:56:23,863 DEBUG RESERVED FW with id: 2478
2020-09-11 08:56:23,863 INFO reserved FW with fw_id: 2478
2020-09-11 08:56:23,864 INFO Created new dir /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509/launcher_2020-09-11-15-56-23-863332
2020-09-11 08:56:23,867 INFO moving to launch_dir /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509/launcher_2020-09-11-15-56-23-863332
2020-09-11 08:56:23,869 INFO submitting queue script
2020-09-11 08:56:24,076 INFO Job submission was successful and job_id is 34253241
2020-09-11 08:56:24,082 INFO Sleeping for 1 seconds…zzz…
2020-09-11 08:56:25,090 DEBUG FW with id: 2477 is unique!
2020-09-11 08:56:25,091 INFO Launching a rocket!
2020-09-11 08:56:25,103 DEBUG FW with id: 2477 is unique!
2020-09-11 08:56:25,110 DEBUG FW with id: 2477 is unique!
2020-09-11 08:56:25,113 DEBUG Created/updated Launch with launch_id: 2847
2020-09-11 08:56:25,140 DEBUG RESERVED FW with id: 2477
2020-09-11 08:56:25,140 INFO reserved FW with fw_id: 2477
2020-09-11 08:56:25,141 INFO Created new dir /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509/launcher_2020-09-11-15-56-25-140352
2020-09-11 08:56:25,144 INFO moving to launch_dir /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509/launcher_2020-09-11-15-56-25-140352
2020-09-11 08:56:25,158 INFO submitting queue script
2020-09-11 08:56:25,324 INFO Job submission was successful and job_id is 34253242
2020-09-11 08:56:25,329 INFO Sleeping for 1 seconds…zzz…
2020-09-11 08:56:26,339 DEBUG FW with id: 2476 is unique!
2020-09-11 08:56:26,339 INFO Jobs in queue (2) meets/exceeds maximum allowed (2)
2020-09-11 08:56:26,347 DEBUG FW with id: 2476 is unique!
2020-09-11 08:56:26,347 INFO Finished a round of launches, sleeping for 60 secs
2020-09-11 08:57:26,407 INFO Checking for Rockets to run…
2020-09-11 08:57:26,625 INFO The number of jobs currently in the queue is: 0
2020-09-11 08:57:26,625 INFO 0 jobs in queue. Maximum allowed by user: 2
2020-09-11 08:57:26,634 DEBUG FW with id: 2484 is unique!
2020-09-11 08:57:26,634 INFO Launching a rocket!
2020-09-11 08:57:26,642 DEBUG FW with id: 2484 is unique!
2020-09-11 08:57:26,649 DEBUG FW with id: 2484 is unique!
2020-09-11 08:57:26,652 DEBUG Created/updated Launch with launch_id: 2850
2020-09-11 08:57:26,679 DEBUG RESERVED FW with id: 2484
2020-09-11 08:57:26,679 INFO reserved FW with fw_id: 2484
2020-09-11 08:57:26,680 INFO Created new dir /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509/launcher_2020-09-11-15-57-26-679990
2020-09-11 08:57:26,683 INFO moving to launch_dir /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509/launcher_2020-09-11-15-57-26-679990
2020-09-11 08:57:26,685 INFO submitting queue script
2020-09-11 08:57:27,474 INFO Job submission was successful and job_id is 34253254
2020-09-11 08:57:27,480 INFO Sleeping for 1 seconds…zzz…
2020-09-11 08:57:28,492 DEBUG FW with id: 2483 is unique!
2020-09-11 08:57:28,492 INFO Launching a rocket!
2020-09-11 08:57:28,500 DEBUG FW with id: 2483 is unique!
2020-09-11 08:57:28,507 DEBUG FW with id: 2483 is unique!
2020-09-11 08:57:28,510 DEBUG Created/updated Launch with launch_id: 2851
2020-09-11 08:57:28,540 DEBUG RESERVED FW with id: 2483
2020-09-11 08:57:28,540 INFO reserved FW with fw_id: 2483
2020-09-11 08:57:28,541 INFO Created new dir /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509/launcher_2020-09-11-15-57-28-540695
2020-09-11 08:57:28,545 INFO moving to launch_dir /global/cscratch1/sd/lbluque/crfew/block_2020-09-06-20-44-35-856509/launcher_2020-09-11-15-57-28-540695
2020-09-11 08:57:28,547 INFO submitting queue script
2020-09-11 08:57:29,532 INFO Job submission was successful and job_id is 34253255
2020-09-11 08:57:29,538 INFO Sleeping for 1 seconds…zzz…
2020-09-11 08:57:30,549 DEBUG FW with id: 2482 is unique!
2020-09-11 08:57:30,549 INFO Jobs in queue (2) meets/exceeds maximum allowed (2)
2020-09-11 08:57:30,555 DEBUG FW with id: 2482 is unique!
2020-09-11 08:57:30,556 INFO Finished a round of launches, sleeping for 60 secs

I’m using the latest version 1.9.6

Let me know any further information of my setup can help track this down. Thanks in advance.

-Luis