I am struggling to execute the magnetic ordering workflow on the Stampede2 cluster. I am able to generate workflows and upload them to the fireworks database. However, I have not been able to run vasp jobs with the magnetic ordering workflow in particular. Other workflows such as a relaxation & static calculation or bandstructure calculations work fine. For context, this workflow was generated from the mp-13 Fe structure. The workflow begins, but then fizzles after a timeout. But, the problem does not seem to be time as the calculation should be quick for iron.
_fw_name: CommonAdapter _fw_q_type: SLURM rocket_launch: rlaunch -c /home1/09282/devonmse/atomate/config rapidfire nodes: 1 ntasks_per_node: 64 walltime: 4:00:00 queue: normal account: TG-MAT210016 job_name: null mail_type: "START,END" mail_user: [email protected] pre_rocket: conda activate mag_order post_rocket: null logdir: /home1/09282/devonmse/atomate/logs
In each launcher directory the std_err.txt file repeats the following message:
c418-051.stampede2.tacc.utexas.edu.220217PSM2 can't open hfi unit: -1 (err=23)  MPI startup(): tmi fabric is not available and fallback fabric is not enabled
Additionally, the bottom of each OUTCAR has the following warning:
I am interpreting this error as vasp asking me to just rerun the job where I use the CONTCAR from the faulty job as the POSCAR for a new job. However, it seems like this would be awkward to implement for each firework.