I noticed that this time 3 VASP computations got launched 
Glad we are making progress!
so is launch_multiprocess
automatically distributing 3 parallel jobs to whatever the number of processes required to do the 3 jobs (run 3 fireworks in parallel)?
Yes. If internally each of those 3 jobs is doing some parallelization (e.g., nested python multiprocessing, OpenMP, or some other thread-based parallelism), each of the three jobs will use that parallelism. This is not something fireworks handles directly but rather says “ok, now it’s time to run this task in this process, so we’ll have the task do whatever it needs to do”.
So if, for example, you have a vasp calculation running 5 OpenMP threads to parallelize across bands, and you’re running 3 of them in parallel, then you get 3x5=15 processes. But again, managing these internal threads is not something fireworks directly does; your calculations should be configured so your scheme for parallelism actually makes sense. Like on a single node you wouldn’t want to run 10 parallel fireworks each having 20 internal threads bc. then you will have 200 threads and your node will probably just lock up.
I would say you would almost certainly not want to use launch_multiprocess on a single node for DFT calculations. I also would guess running more than like 1 DFT calculation on a single node in parallel will net you almost no benefits of parallelization - why? because if your VASP config is properly parallelized, you’ll be using all the cores for a single calculation , and adding more will just make the other calculations wait on the CPU or run out of memory. This will not be the case for (example) 1000 nodes each running one firework - here you would see huge benefits of parallelization.
In case of which has CRAY XC50 464 nodes, launch_multiprocess
wont be able to utilize multiple nodes, rather multiple processes in single node, right?
Yes, that is exactly correct. You’ll probably want options (3), or (4), but for simplicity let’s just try (3) first?
For this, you need to just run either rapidfire
or launch_rocket
in the shell on all the compute nodes simultaneously. Normally this is done by:
(1) Having the workflows ready to go in your FWs launchpad (if you are using rocketsled you’ll probably only need to add the workflows once - the first loop - then they will be added automatically)
(2) Either
- (a) Submitting a bunch of compute node requests at once, where each will run a
launch_rocket
or rapidfire
in a batch script. Then each of these will pull and run firework(s)
- (b) Having a cron-job or script submit new compute node requests on a regular basis. Then, on some regular basis, you’ll have each compute node requested run a batch script running
launch_rocket
or rapidfire
and your DFT jobs will run in a regular interval. For example, if you are running 100 DFT calculations in a batch in parallel across 100 nodes, and the DFT takes 2 hours each, you might submit 100 new compute node requests every 2 hours. Picking the exact scheme which is right for your problem is up to you, this is just a naive suggestion