Dear Anubhav,
Thank you very much for your quick reply and helpful information!!! I really appreciate it.
You were right- I was fundamentally misunderstanding FireWork vs FireWorker. Thank you for helping fix this and come to the correct understanding that the FireWorker is kind of a job configuration setting file, and the FireWork describes the workflow itself.
With this missing piece, I was finally able to run my desired workflow, so thank you!
I do have one final question, though. With my current setup, I noticed that I have to issue 3 qlaunch commands to get all of my FireWorks to run.
qlaunch -q my_qadapter1.yaml -w my_fworker1.yaml singleshot #launches step 1
qlaunch -q my_qadapter2.yaml -w my_fworker2.yaml singleshot #launches step 2
qlaunch -q my_qadapter1.yaml -w my_fworker1.yaml singleshot #needed to launch step 3, which is in state WAITING
I was naively expecting that I would only have to issue two qlaunches, one for each queue adaptor configuration. Is this a consequence of the workflow structure I have chosen, or is there some additional configuration setting/strategy I should use?
For completeness, here is my whole workflow:
stephey@perlmutter:login02:~> module load python
stephey@perlmutter:login02:~> conda activate fireworks
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:~> cd /pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> lpad reset
Are you sure? This will RESET 1 workflows and all data. (Y/N)y
2023-04-02 20:52:07,940 INFO Performing db tune-up
2023-04-02 20:52:08,001 INFO LaunchPad was RESET.
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> cat fw_diabetes_wf.yaml
fws:
- fw_id: 1
spec:
_category: onenode
_tasks:
- _fw_name: ScriptTask
script: srun python step_1_diabetes_preprocessing.py
- fw_id: 2
spec:
_category: twonode
_tasks:
- _fw_name: ScriptTask
script: srun -n 10 python step_2_diabetes_correlation.py
- fw_id: 3
spec:
_category: onenode
_tasks:
- _fw_name: ScriptTask
script: srun python step_3_diabetes_postprocessing.py
links:
1:
- 2
2:
- 3
metadata: {}
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> lpad add fw_diabetes_wf.yaml
2023-04-02 20:59:16,444 INFO Added a workflow. id_map: {1: 1, 2: 2, 3: 3}
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> cat my_fworker1.yaml
name: one node fireworker
category: onenode
query: '{}'
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> cat my_qadapter1.yaml
_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -w my_fworker1.yaml -l my_launchpad.yaml singleshot
constraint: cpu
nodes: 1
ntasks: 1
account: nstaff
walltime: '00:05:00'
queue: debug
job_name: null
logdir: null
pre_rocket: null
post_rocket: null
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> qlaunch -q my_qadapter1.yaml -w my_fworker1.yaml singleshot
2023-04-02 21:00:33,387 INFO moving to launch_dir /pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC
2023-04-02 21:00:33,390 INFO submitting queue script
2023-04-02 21:00:34,003 INFO Job submission was successful and job_id is 6895033
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> lpad get_fws
[
{
"fw_id": 1,
"created_on": "2023-04-03T03:59:16.437426",
"updated_on": "2023-04-03T04:00:43.350942",
"state": "COMPLETED",
"name": "Unnamed FW"
},
{
"fw_id": 2,
"created_on": "2023-04-03T03:59:16.437581",
"updated_on": "2023-04-03T04:00:43.352286",
"state": "READY",
"name": "Unnamed FW"
},
{
"fw_id": 3,
"created_on": "2023-04-03T03:59:16.437671",
"updated_on": "2023-04-03T03:59:16.437671",
"name": "Unnamed FW",
"state": "WAITING"
}
]
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> qlaunch -q my_qadapter2.yaml -w my_fworker2.yaml singleshot
2023-04-02 21:01:03,996 INFO moving to launch_dir /pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC
2023-04-02 21:01:03,997 INFO submitting queue script
2023-04-02 21:01:04,101 INFO Job submission was successful and job_id is 6895035
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> lpad get_fws
[
{
"fw_id": 1,
"created_on": "2023-04-03T03:59:16.437426",
"updated_on": "2023-04-03T04:00:43.350942",
"state": "COMPLETED",
"name": "Unnamed FW"
},
{
"fw_id": 2,
"created_on": "2023-04-03T03:59:16.437581",
"updated_on": "2023-04-03T04:01:09.537061",
"state": "RUNNING",
"name": "Unnamed FW"
},
{
"fw_id": 3,
"created_on": "2023-04-03T03:59:16.437671",
"updated_on": "2023-04-03T03:59:16.437671",
"name": "Unnamed FW",
"state": "WAITING"
}
]
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> qlaunch -q my_qadapter1.yaml -w my_fworker1.yaml singleshot
2023-04-02 21:01:22,716 INFO moving to launch_dir /pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC
2023-04-02 21:01:22,718 INFO submitting queue script
2023-04-02 21:01:22,810 INFO Job submission was successful and job_id is 6895040
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC> lpad get_fws
[
{
"fw_id": 1,
"created_on": "2023-04-03T03:59:16.437426",
"updated_on": "2023-04-03T04:00:43.350942",
"state": "COMPLETED",
"name": "Unnamed FW"
},
{
"fw_id": 2,
"created_on": "2023-04-03T03:59:16.437581",
"updated_on": "2023-04-03T04:01:11.994916",
"state": "COMPLETED",
"name": "Unnamed FW"
},
{
"fw_id": 3,
"created_on": "2023-04-03T03:59:16.437671",
"updated_on": "2023-04-03T04:01:30.222269",
"state": "COMPLETED",
"name": "Unnamed FW"
}
]
(/global/common/software/das/stephey/conda/conda_envs/fireworks) stephey@perlmutter:login02:/pscratch/sd/s/stephey/DOE-HPC-workflow-training/FireWorks/NERSC>
Thank you again,
Laurie