Thousands of simultaneous small jobs

I would like to use fireworks for a project where we will need to run many short workflows, with few steps and with each step lasting from few minutes to a couple of hours on a single core. In principle there should be always several thousands
of
jobs running at the same time. Up to now I have used fireworks with much smaller numbers and I expect that this may stress firework’s capacity in several ways (e.g. need to constantly submit a large number of jobs that
are also finishing at a high rate; fast increase
of the DB size and so on).

I would like to know if some user is already aware of problems or bottlenecks that we should take into account with this kind of requirements.

Thanks.

Guido

Hello,

Hi,

I haven’t run into a situation on the scale of yours before, but I have encountered issues with mongo and the number of available connections to the db. Typically these manifest as errors of “too many open files”! These errors can be solved with a handwave if you’re running the db on a unix system: https://docs.mongodb.com/manual/reference/ulimit/

As far as a scalable solution, you might want to look into sharding https://docs.mongodb.com/manual/sharding/ with mongodb

Thanks,

Alex

···

On Thursday, February 8, 2018 at 1:32:51 AM UTC-8, Guido Petretto wrote:

I would like to use fireworks for a project where we will need to run many short workflows, with few steps and with each step lasting from few minutes to a couple of hours on a single core. In principle there should be always several thousands
of
jobs running at the same time. Up to now I have used fireworks with much smaller numbers and I expect that this may stress firework’s capacity in several ways (e.g. need to constantly submit a large number of jobs that
are also finishing at a high rate; fast increase
of the DB size and so on).

I would like to know if some user is already aware of problems or bottlenecks that we should take into account with this kind of requirements.

Thanks.

Guido

Hello,

Most of what I heard from people from the FireWorks side is that there start to be problems when:

  • you have a lot of workflows that are each very big. e.g., each workflow has 10,000 Fireworks in it or something.

  • all jobs are expected to finish at ~the same time. e.g., if you have 10,000 Fireworks from a queue of a million running in parallel and each finishes at the same time, you have 10,000 jobs trying to do database writes at the same time. Normally, the jobs we run have a distribution of runtimes (and even the queue start times are staggered by the queueing system) so I haven’t seen this happen, but I imagine it could be a problem. It can also be a problem if many jobs start running at exactly the same time since each of them is trying to pull a job from the database.

Otherwise, you would need to try it but I don’t know of any problems from the Fireworks side other than what Alex mentioned. That said, depending on the architecture of your computing cluster, there can be problems from the cluster side. Many clusters are not really set up to run several thousands of small jobs at once. For example, they have a “MOM” node that serves as an intermediary between the head node and compute nodes. All Python portions of the Firework are actually performed on the MOM node and only the “mpi” commands are brought over to the compute nodes. If your cluster is this kind of architecture and there is a ratio of say 50 compute nodes per MOM node, you might have Python processes for 50 Fireworks sharing the same MOM node which can stress or crash it.

···

On Wednesday, February 28, 2018 at 2:09:18 PM UTC-8, Alex Dunn wrote:

Hi,

I haven’t run into a situation on the scale of yours before, but I have encountered issues with mongo and the number of available connections to the db. Typically these manifest as errors of “too many open files”! These errors can be solved with a handwave if you’re running the db on a unix system: https://docs.mongodb.com/manual/reference/ulimit/

As far as a scalable solution, you might want to look into sharding https://docs.mongodb.com/manual/sharding/ with mongodb

Thanks,

Alex

On Thursday, February 8, 2018 at 1:32:51 AM UTC-8, Guido Petretto wrote:

I would like to use fireworks for a project where we will need to run many short workflows, with few steps and with each step lasting from few minutes to a couple of hours on a single core. In principle there should be always several thousands
of
jobs running at the same time. Up to now I have used fireworks with much smaller numbers and I expect that this may stress firework’s capacity in several ways (e.g. need to constantly submit a large number of jobs that
are also finishing at a high rate; fast increase
of the DB size and so on).

I would like to know if some user is already aware of problems or bottlenecks that we should take into account with this kind of requirements.

Thanks.

Guido

Hello,

Dear Alex and Anubhav,

thanks for sharing your experience on the possible problems that could come up.
One possible issue I was considering is related to the submission of the jobs. Let’s suppose that I will have to run ~3000 jobs at the same time and that each job will take roughly one hour. This will require that a new job is submitted approximately every second. When submitting with a single qlaunch, based on my previous experience, I had the impression that it may struggle to keep the nodes full when having to submit at such rate, especially if among the jobs there also are some small very short fireworks. What could be a good strategy to ensure that the nodes are not left empty due to a lag in the submission? Should I consider options like launching several qlaunch at the same time and using mlaunch to speed up the submission process? Or it is unlikely that this will be an issue at all?

Thanks,

Guido

···

Il giorno giovedì 1 marzo 2018 18:44:45 UTC+1, Anubhav Jain ha scritto:

Most of what I heard from people from the FireWorks side is that there start to be problems when:

  • you have a lot of workflows that are each very big. e.g., each workflow has 10,000 Fireworks in it or something.
  • all jobs are expected to finish at ~the same time. e.g., if you have 10,000 Fireworks from a queue of a million running in parallel and each finishes at the same time, you have 10,000 jobs trying to do database writes at the same time. Normally, the jobs we run have a distribution of runtimes (and even the queue start times are staggered by the queueing system) so I haven’t seen this happen, but I imagine it could be a problem. It can also be a problem if many jobs start running at exactly the same time since each of them is trying to pull a job from the database.

Otherwise, you would need to try it but I don’t know of any problems from the Fireworks side other than what Alex mentioned. That said, depending on the architecture of your computing cluster, there can be problems from the cluster side. Many clusters are not really set up to run several thousands of small jobs at once. For example, they have a “MOM” node that serves as an intermediary between the head node and compute nodes. All Python portions of the Firework are actually performed on the MOM node and only the “mpi” commands are brought over to the compute nodes. If your cluster is this kind of architecture and there is a ratio of say 50 compute nodes per MOM node, you might have Python processes for 50 Fireworks sharing the same MOM node which can stress or crash it.

On Wednesday, February 28, 2018 at 2:09:18 PM UTC-8, Alex Dunn wrote:

Hi,

I haven’t run into a situation on the scale of yours before, but I have encountered issues with mongo and the number of available connections to the db. Typically these manifest as errors of “too many open files”! These errors can be solved with a handwave if you’re running the db on a unix system: https://docs.mongodb.com/manual/reference/ulimit/

As far as a scalable solution, you might want to look into sharding https://docs.mongodb.com/manual/sharding/ with mongodb

Thanks,

Alex

On Thursday, February 8, 2018 at 1:32:51 AM UTC-8, Guido Petretto wrote:

I would like to use fireworks for a project where we will need to run many short workflows, with few steps and with each step lasting from few minutes to a couple of hours on a single core. In principle there should be always several thousands
of
jobs running at the same time. Up to now I have used fireworks with much smaller numbers and I expect that this may stress firework’s capacity in several ways (e.g. need to constantly submit a large number of jobs that
are also finishing at a high rate; fast increase
of the DB size and so on).

I would like to know if some user is already aware of problems or bottlenecks that we should take into account with this kind of requirements.

Thanks.

Guido

Hello,

Hi Guido,

Apart from having multiple qlaunch scripts, there are two potential solutions to a situation like you describe:

  1. Currently, the qlaunch script waits 5 seconds between submission. There is no reason to do this other than to avoid overloading the queue manager (PBS, SLURM, etc.). If you think the queue system at your cluster can handle more rapid job submission, you can lower this wait time (e.g., 1 second) by setting the QUEUE_UPDATE_INTERVAL parameter in the FW config. You can set this as low as the queue system can handle.

  2. A different approach is to require less queue submissions in the first place by packing multiple jobs within a single queue submission. You can do this by changing the rocket_launch parameter in your my_qadapter.yaml to be “rlaunch rapidfire …” instead of “rlaunch singleshot …”. This will allow multiple jobs to run within a single queue submission script. The only potential issue is that a Firework that starts late in the job, i.e., near the walltime, might run out of time and get killed by the walltime. There are multiple ways around this. For example, if you expect each job to take about 1 hour and no more than 2 hours, and your queue job walltime is 24 hours, then you can set “–nlaunches 12” to make sure that no more than 12 jobs are run, making it very unlikely that you will hit this problem. Even if you do “–nlaunches 2”, which is very conservative, you’d reduce the number of queue launches by a factor of two. A slightly more sophisticated version of this strategy uses the “–timeout” parameter of rlaunch. e.g., if you have a 24 hour walltime queue job, and you set the “–timeout” of rlaunch to be 43200 (seconds, =12 hours), then no jobs will be submitted after 12 hours have passed. This means that even the last job to be submitted within that rlaunch will have at least 12 more hours before the 24 hour walltime is hit. This allows you to more confidently pack more jobs into the rlaunch than the nlaunches method - now your rlaunch might be able to complete 100 jobs (if they are short) before timing out after 12 hours of launches. Just make sure that your walltime minus your timeout is enough time for a long-running job to have a fair chance at completion. And of course, if the odd job here or there does get killed due to walltime, it will be stuck in the RUNNING state and you can easily use FWS to rerun it again.

Best,

Anubhav

···

On Friday, March 2, 2018 at 5:11:59 AM UTC-8, Guido Petretto wrote:

Dear Alex and Anubhav,

thanks for sharing your experience on the possible problems that could come up.
One possible issue I was considering is related to the submission of the jobs. Let’s suppose that I will have to run ~3000 jobs at the same time and that each job will take roughly one hour. This will require that a new job is submitted approximately every second. When submitting with a single qlaunch, based on my previous experience, I had the impression that it may struggle to keep the nodes full when having to submit at such rate, especially if among the jobs there also are some small very short fireworks. What could be a good strategy to ensure that the nodes are not left empty due to a lag in the submission? Should I consider options like launching several qlaunch at the same time and using mlaunch to speed up the submission process? Or it is unlikely that this will be an issue at all?

Thanks,

Guido

Il giorno giovedì 1 marzo 2018 18:44:45 UTC+1, Anubhav Jain ha scritto:

Most of what I heard from people from the FireWorks side is that there start to be problems when:

  • you have a lot of workflows that are each very big. e.g., each workflow has 10,000 Fireworks in it or something.
  • all jobs are expected to finish at ~the same time. e.g., if you have 10,000 Fireworks from a queue of a million running in parallel and each finishes at the same time, you have 10,000 jobs trying to do database writes at the same time. Normally, the jobs we run have a distribution of runtimes (and even the queue start times are staggered by the queueing system) so I haven’t seen this happen, but I imagine it could be a problem. It can also be a problem if many jobs start running at exactly the same time since each of them is trying to pull a job from the database.

Otherwise, you would need to try it but I don’t know of any problems from the Fireworks side other than what Alex mentioned. That said, depending on the architecture of your computing cluster, there can be problems from the cluster side. Many clusters are not really set up to run several thousands of small jobs at once. For example, they have a “MOM” node that serves as an intermediary between the head node and compute nodes. All Python portions of the Firework are actually performed on the MOM node and only the “mpi” commands are brought over to the compute nodes. If your cluster is this kind of architecture and there is a ratio of say 50 compute nodes per MOM node, you might have Python processes for 50 Fireworks sharing the same MOM node which can stress or crash it.

On Wednesday, February 28, 2018 at 2:09:18 PM UTC-8, Alex Dunn wrote:

Hi,

I haven’t run into a situation on the scale of yours before, but I have encountered issues with mongo and the number of available connections to the db. Typically these manifest as errors of “too many open files”! These errors can be solved with a handwave if you’re running the db on a unix system: https://docs.mongodb.com/manual/reference/ulimit/

As far as a scalable solution, you might want to look into sharding https://docs.mongodb.com/manual/sharding/ with mongodb

Thanks,

Alex

On Thursday, February 8, 2018 at 1:32:51 AM UTC-8, Guido Petretto wrote:

I would like to use fireworks for a project where we will need to run many short workflows, with few steps and with each step lasting from few minutes to a couple of hours on a single core. In principle there should be always several thousands
of
jobs running at the same time. Up to now I have used fireworks with much smaller numbers and I expect that this may stress firework’s capacity in several ways (e.g. need to constantly submit a large number of jobs that
are also finishing at a high rate; fast increase
of the DB size and so on).

I would like to know if some user is already aware of problems or bottlenecks that we should take into account with this kind of requirements.

Thanks.

Guido

Hello,