I am using FireWorks to run multiple LAMMPS simulations on NERSC. In my setup, each firework includes a TemplateWriterTask (to create the input file) and a ScriptTask (to perform
srun). I have things up and running on the GPU nodes. However, I am encountering difficulties on the Haswell nodes. Specifically, in my
my_qadapter.yaml file I use the following options (in addition to specifying Haswell, wall clock time, etc.):
srun command for each firework is
srun -N 1 -n 8 -c 1 --cpu-bind=cores /usr/common/software/lammps/patch_20Nov2019/hsw/bin/lmp -in in.file
When I launch the firework and look in the launcher directory, I see the following in the error file:
srun: error: gres_plugin_job_state_unpack: no plugin configured to unpack data type 4047587904 from job 40594442
srun: gres_plugin_step_state_unpack: no plugin configured to unpack data type 4047587904 from step 40594442.0
srun: error: Task launch for 40594442.0 failed on node nid00891: Invalid job credential
srun: error: Application launch failed: Invalid job credential
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete
In the launcher folder, I see that the TemplateWriterTask completed successfully and all the input files were correctly generated. I am a little perplexed because when I run similar commands not using FireWorks (e.g. sbatch a job script containing an identical srun command), things evaluate without error.
I’d be grateful for any guidance or suggestions on this issue. Am I missing something obvious?