I will be testing Fireworks by running very simple binaries on the Hopper compute nodes on various workflow structures (sequential, parallel, etc). My methodology is to create and upload a FireWork which (via ScriptTasks) moves to the desired directory and runs the binary. To retrieve a FireWork from the Worker (here, the Hopper), the Fireworks documentation recommends running a qsub script (via qlaunch) which calls rlaunch from the MOM node (here is the script I am using, based on the qlaunch’s script):
#PBS -l mppwidth=24
#PBS -l walltime=00:01:00
#PBS -q debug
#PBS -N FW_job
#PBS -d /global/homes/s/smandala/fireworks_code/test_code/queue_scripts/hopper_results
#PBS -o FW_job.out
#PBS -e FW_job.error
rlaunch -w $HOME/fireworks_code/config_files/my_fworker.yaml -l $HOME/fireworks_code/config_files/my_launchpad.yaml rapidshot
CommonAdapter (PBS) completed writing Template
To my knowledge, though, to use the Hopper’s compute nodes, jobs need to be run through “aprun” command; calling rlaunch from the MOM node (seemingly) only runs the FireWorks on the MOM node and does not push the job to the compute nodes.
Overall, I am curious about how to run my FireTasks on the compute nodes. Ideally, I would like to create multiple compute nodes running “rapid-fire” mode to retrieve and complete the pending FireWorks (for workflows involving parallelizable jobs). I can think of a couple of solutions to get the FireWorks pushed to the compute nodes, but I am not sure if my “methodology” or any of these solutions would optimal. Any information or tips regarding how to go about this would be a big help.