Batch Optimization in Rocketsled Doesnot Run in Parallel

Awahab · April 27, 2022, 9:24pm

thanks for swift correspondence! I am basically launching Fireworks in Python, as per this previous thread. My configuration is same as in the thread, instead the evaluator is now DFT calculation. I am running the following:

    rapidfire(launchpad, nlaunches=100)

Revisiting our previous thread, I saw your following comment:

First, we should clear up exactly what you are timing. It seems from your code the timing is comparing 30 launches of the sequential (non-batch, or batch=1) workflow to 30 launches of the batch=15 workflow.
I’ll assume the batch_size_b=115 is a typo and you meant batch_size_b=15. Your objective function is very fast and has a basically negligible time of evaluation. So what we are really comparing in your example is the timing internally for FireWorks and Rocketsled to process two different workflows.

If the above is correct and what you intended, then the timings are pretty explainable. There are several reasons why single experiments run sequentially take longer than batches.

Sequential experiments run optimizations on every workflow. Batches run optimizations on every batch_size workflow. So if you are running 30 in total, the sequential will run 30 optimizations whereas the batch=15 case will run only 2. In this case, the optimization time is not trivial compared to the objective function (rosenbrock), so the optimization itself is the expensive step. So in one case you’re running 30 computations and in the other you’re really only running 2. This is probably the main reason for the discrepancies in timings.

Submitting workflows to the launchpad and executing them in bulk (as the larger batch size does) is likely more efficient than submitting them and processing them sequentially. Though I wouldn’t expect this to have a large effect, likely maybe a few milliseconds difference in timing.

Now I am using actual DFT calculations so evaluation timings are not negligible as it take up-to 2hr for single DFT calculation to run. I thought that the parallel computation part in Fireworks is abstracted from the user. As batch optimization code in task.py creates workflows of N-batch size, I thought those workflows would run in parallel by Fireworks where as Rocketsled’s optimization would wait for N batches to complete before finding top-N suggestions to run for next batch.

If its not the case then is there something that could be done?