Split Workflow across Multiple Workers

Anirudh_Appachar · November 8, 2021, 12:56am

Hello,

I have been using Atomate for a few months now and I have found it very convenient. There is one particular thing I’m trying to do with my workflows that I’m running into issues on, and I was wondering if anybody here knew of a workaround or potential solution.

I have access to two workers A & B, where worker A is on our local University cluster (with a much lower wait-time before running a job), and worker B is on a much more powerful national cluster (with correspondingly longer wait-time).

I would like to split up my workflow across these two workers. The workflow I am running is a simple relaxation + static calculation, made up of the OptimizeFW and the StaticFW. Since the relaxation calculations are more resource-intensive/time-consuming, I would like to run the OptimizeFWs exclusively on Worker B, and run the comparatively fast StaticFWs on Worker A. (I have already set up my worker categories so that Worker B is only pulling the OptimizeFWs)

My preliminary attempts to do this have resulted in the StaticFWs fizzling on Worker A. The errors are all “no such directoy exists” types of errors; in looking at the code for the StaticFW, it looks like it is trying to pull the structure from the directory of the previous relaxation calculation, but because that relaxation was done on Worker B, the firework is unable to find the corresponding directory on Worker A.

Is it possible to have the StaticFW pull the structure from the results database instead of the previous directory, and if not, is such a feature under consideration for a future update?

Thank you,

Anirudh

Anubhav_Jain · November 8, 2021, 5:02pm

Hi Anirudh,

Unfortunately, currently there is no way to get the information from the database in the pre-built workflows, you have to copy flat files over from the structure relaxation to the static relaxation.

The flat files can be copied from a different file system but the pre-built workflows don’t have an easy option for this. If you look at StaticFW in the code you will see it appends a Firetask called CopyVaspOutputs. That Firetask has an option called “filesystem” which can point to a remote filesystem assuming you have passwordless SSH set up for that remote filesystem. If you can set that filesystem parameter you would be able to copy from a different cluster where the structure optimization took place.
This is not well tested, but certainly what you are describing is a completely in-bounds and reasonable use case.

Unfortunately that’s probably all the guidance I can provide - good luck!

Anubhav_Jain · November 8, 2021, 5:03pm

Also I should mention that in terms of future updates - there is nothing like this planned for atomate but we are currently working on atomate2 where this should be possible to do through the database.

Eric_Sivonxay · November 10, 2021, 7:42pm

Hi Anirudh,

If you’re able to build fireworks/workflows yourself, there is a way to utilize data stored in the firework spec to pass data through the database. An example of this is in the SaveStructureTask and PreviousStructureTask in mpmorph/glue_tasks.py at 9bb256d69db1c10a00be9a31f1dcf362691aaaeb · materialsproject/mpmorph · GitHub

Also, for the use case you describe, there are OptimizeFW and StaticFW in mpmorph that would fit your use case. I will warn you that these were adapted from atomate 3-4 years ago and may not be up to date with the current OptimizeFW and StaticFW in atomate. mpmorph/core.py at 9bb256d69db1c10a00be9a31f1dcf362691aaaeb · materialsproject/mpmorph · GitHub

Eric

Anirudh_Appachar · November 11, 2021, 6:44pm

Thank you both for your suggestions! I had noticed that the mpmorph FWs had those options but I hadn’t looked into the code too deeply. I will take a closer look at the SaveStructureTask and PreviousStructureTask and see if I can incorporate it into a custom workflow with current OptimizeFWs and StaticFWs.