Batch Optimization in Rocketsled Doesnot Run in Parallel

@ardunn

I am running batch optimization using the code with DFT. I have 24 cores cpu so the objective was to run several DFT calculations in parallel using batch optimization. The Rocketsled even-though provides batch suggestions, those batch suggestions however are not run in parallel, rather sequential. So if the batch_size is say 4, I do get 4 suggestions but those 4 suggestions are run in sequential. My workflow is setup as follows:

optimization_task = Firework([OptTask(**db_info)],name="optimization_task")
workflow = Workflow([fw1,fw2,optimization_task],{fw1:fw2, fw2:optimization_task})

So even-though the topology is sequential fw1 -> fw2 -> optimization_task for batch of N, N workflows are added, however, all these N workflows are executed in sequential!

Is there a way to parallelize so I can run several DFT calculations in cluster?

Note: I looked into the option of enforce_sequential but that i am fine for optimizer to wait for all the batches and provide suggestions that are not duplicate. As DFT calculations take more or less the same time so its worth waiting

Hey @Awahab

How are you actually running the workflows on the cluster?

Rocketsled does not have any capability to actually run the workflows in parallel for you (that’s Fireworks), it can only submit workflows which can be run in parallel if you choose to run them that way.

If you want to run several workflows/fws in parallel on the same node, you’ll probably want to use rlaunch multi from fireworks to launch them in separate processes (see the fireworks docs). If you want to run several workflows/fws in parallel across multiple nodes, you just need to pull and run jobs on each node (see this part of the fireworks docs).

If you have already considered this and it is still not working, what fireworks commands are you currently using to run your workflows?

Hi @ardunn,

thanks for swift correspondence! I am basically launching Fireworks in Python, as per this previous thread. My configuration is same as in the thread, instead the evaluator is now DFT calculation. I am running the following:

    rapidfire(launchpad, nlaunches=100)

Revisiting our previous thread, I saw your following comment:

First, we should clear up exactly what you are timing. It seems from your code the timing is comparing 30 launches of the sequential (non-batch, or batch=1) workflow to 30 launches of the batch=15 workflow.
I’ll assume the batch_size_b=115 is a typo and you meant batch_size_b=15. Your objective function is very fast and has a basically negligible time of evaluation. So what we are really comparing in your example is the timing internally for FireWorks and Rocketsled to process two different workflows.

If the above is correct and what you intended, then the timings are pretty explainable. There are several reasons why single experiments run sequentially take longer than batches.

  1. Sequential experiments run optimizations on every workflow. Batches run optimizations on every batch_size workflow. So if you are running 30 in total, the sequential will run 30 optimizations whereas the batch=15 case will run only 2. In this case, the optimization time is not trivial compared to the objective function (rosenbrock), so the optimization itself is the expensive step. So in one case you’re running 30 computations and in the other you’re really only running 2. This is probably the main reason for the discrepancies in timings.
  2. Submitting workflows to the launchpad and executing them in bulk (as the larger batch size does) is likely more efficient than submitting them and processing them sequentially. Though I wouldn’t expect this to have a large effect, likely maybe a few milliseconds difference in timing.

Now I am using actual DFT calculations so evaluation timings are not negligible as it take up-to 2hr for single DFT calculation to run. I thought that the parallel computation part in Fireworks is abstracted from the user. As batch optimization code in task.py creates workflows of N-batch size, I thought those workflows would run in parallel by Fireworks where as Rocketsled’s optimization would wait for N batches to complete before finding top-N suggestions to run for next batch.

If its not the case then is there something that could be done?

So if you are only running the fireworks on one node that rapidfire command you have above will necessarily run them sequentially, not in parallel.

To summarize parallelism in Fireworks/rocketsled:

  • Rocketsled does not itself manage parallelism apart from the submission and management of workflows. It does not actually itself run workflows. This is managed by fireworks, and you need to call the correct commands to have them run.
  • Fireworks does abstract parallelism from the user, but you have to still call it with the correct commands to actually pull the correct workflows from the launchpad and actually run them in parallel. There are several ways to use parallelism in fireworks:
  1. One node, running multiple fireworks in parallel. This is if you want multiple workflows to run at the same time on the same computer. The fireworks are managed as multiple processes. You run this with rlaunch multi (command line) or launch_multiprocess (python)
  2. Multiple nodes running one firework: Use this if you have big calculations where each one needs to be parallelized using MPI or OpenMP. Fireworks has documentation for doing this, and you’ll need to configure your calculations to use MPI etc. In this case, once configured, you can run rlaunch on a single node and it will run that single calculation across multiple nodes.
  3. Multiple nodes each running their own firework: Use this if a single node can handle a single calculation but you want to run a bunch of them at the same time. For this you will run rlaunch on each of the nodes independently and they will pull Fws from the launchpad and run them independently.
  4. Multiple instances of calculations where each calculation requires multiple nodes. For example, if you are going to run 5 “big” fireworks, and each needs to be parallelized over 10 nodes, you run rlaunch on the 5 head nodes and Fireworks+MPI/OMP will run these 5 big fireworks across all 50 required nodes. This is a combination of (2) and (3)

There are others as well, like running multiple instances of multiple fireworks on the same node (so kind of like (1) and (3) combined). These can all be done programmatically with submission scripts + cronjobs so as soon as you submit something to the launchpad (whether by rocketsled or manually) your cluster will pull the firework and run them automatically.

Rocketsled doesn’t concern itself with actual running of the parallel workflows (apart from managing which ones are submitted to the launchpad and how that is done), so it can handle all of the above scenarios.

Your specific case

It sounds like what you are trying to do is the (1) scenario. If you want to launch these kinds of fireworks in parallel from the command line, use rlaunch multi; if you want to do it in python, use launch_multiprocess from fireworks.features.multi_launcher . If you want to launch them in a loop, put the launch_multiprocess inside a loop.

That being said, if you are running multiple DFT calculations on a single node I would bet you are going to run into issues (memory, compute problems, etc.). I’d personally recommend running N calculations on N nodes, where each node runs its own single calculation (the scenario in (3) above). To do this, you add workflows to the launchpad (either by rocketsled or manually), then on each node you run them with rapidfire or launch_rocket.

Tagging @computron here if he feels this needs more clarification

@ardunn

Allright I shall try this. I believe there might be a confusion of the word “node”. By node I mean individual process/thread of cluster. So having that said I am not planning to launch multiple DFT calculations on a single node. My objective is to launch n-nodes for n-batches. That would make sense for running multiple DFT in a cluster. Usually in cluster we use slurm to do such scheduling to run n-jobs in n-nodes. But since this responsiblility is taken by Fireworks, I am interested in running n-workflows produced by task.py to launch n-fireworks in parallel, the optimizer then waits for them to complete before combining the data, retraining the GP and producing the n-batches again.

In my case its 1 node per firework so (3) applies for my case as you suggested. I shall look into your comments again and get back to you.

Hi @Awahab

Here’s a decent overview of some HPC terms including what I mean by “node”: What are standard terms used in HPC? — VSC documentation

Basically a node = 1 compute server.

I’ll be able to help you debug more in depth later next week, so maybe hold tight until then if you are still having problems!!

1 Like

Hey @Awahab, were you able to figure out this problem?

Hi @ardunn,

Thanks for getting back. I just arrived from our break followed by ACS conference. I shall look into it this week and get back.

Thanks again!

1 Like