More WFs submitted than are present in lpad

Hello,

I have a total of 1300 FWs and 650 WFs in my lpad.
The last time I added new WFs, they were 218. The WFs are pretty basics, each is a 2-fws: turbomole define + geometry relaxation (jobex).

Nevertheless, qlaunch rapidfire won’t stop submitting jobs. We are over 10k atm!! I had to stop it.
With previous projects, I had the feeling there was already more slurm jobs sumbitted than the actual number of wf… but now it’s just too many. I am trying to figure out if this is normal.

Is there a way to know how many jobs will qlaunch sumbit? I have 45 block_* folders and over 18k launcher_* subfolders. Are all these “real” jobs?

Thank you

qlaunch rapidfire will submit jobs indefinitely. Look at the optional flags for more control over this.

  -m MAXJOBS_QUEUE, --maxjobs_queue MAXJOBS_QUEUE
                        maximum jobs to keep in queue for this user. 0 for no limit
  -b MAXJOBS_BLOCK, --maxjobs_block MAXJOBS_BLOCK
                        maximum jobs to put in a block
  --nlaunches NLAUNCHES
                        num_launches (int or "infinite"; default 0 is all jobs in DB)

“infinite” here, the default behavior, will keep running until there are no more ready jobs in the DB. So, if your jobs are all stuck waiting in the queue, it will keep submitting.

ok, but what is the best practice then? If I have N WFs to run and M slurm nodes available?

I understand I could do --nlaunches 1 and it would put all jobs into a single slurm job (and a single block).
Supposely, --nlaunches M would occupy all resources available.

Thank you

See discussion of limitations in:
https://materialsproject.github.io/fireworks/queue_tutorial.html

Only reservation mode will keep a 1:1 of fireworks and queue submissions

To be clear on things,

I understand I could do --nlaunches 1 and it would put all jobs into a single slurm job (and a single block

qlaunch rapidfire --nlaunches 1 does not put aall jobs into one slurm job. It submits one slurm job only. The interaction between the number of FWs and this slurm job is determined by the qadapter file. Usually the command inside of qadapter is rlaunch singleshot which assigns one FW job, not all, to be run inside of the queue submission.

Supposely, --nlaunches M would occupy all resources available.

qlaunch --nlaunches M would submit M slurm jobs, and has nothing to do with the resources they consume.