Magnetic ordering workflow:

I am struggling to execute the magnetic ordering workflow on the Stampede2 cluster. I am able to generate workflows and upload them to the fireworks database. However, I have not been able to run vasp jobs with the magnetic ordering workflow in particular. Other workflows such as a relaxation & static calculation or bandstructure calculations work fine. For context, this workflow was generated from the mp-13 Fe structure. The workflow begins, but then fizzles after a timeout. But, the problem does not seem to be time as the calculation should be quick for iron.

my_qadapter.yaml

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -c /home1/09282/devonmse/atomate/config rapidfire
nodes: 1
ntasks_per_node: 64
walltime: 4:00:00
queue: normal
account: TG-MAT210016 
job_name: null
mail_type: "START,END"
mail_user: [email protected]
pre_rocket: conda activate mag_order
post_rocket: null
logdir: /home1/09282/devonmse/atomate/logs

In each launcher directory the std_err.txt file repeats the following message:

c418-051.stampede2.tacc.utexas.edu.220217PSM2 can't open hfi unit: -1 (err=23)
[37] MPI startup(): tmi fabric is not available and fallback fabric is not enabled 

Additionally, the bottom of each OUTCAR has the following warning:

Screenshot 2023-05-24 at 2.08.52 PM

I am interpreting this error as vasp asking me to just rerun the job where I use the CONTCAR from the faulty job as the POSCAR for a new job. However, it seems like this would be awkward to implement for each firework.

Hi Devon,

The magnetic ordering workflow in atomate is just running a series of standard VASP calculations (relaxations/static energy), it is unlikely to be responsible and more likely a VASP issue.

Two thoughts:

  1. Have you tried the smaller EDIFF as suggested? Or reviewed how Custodian attempts to correct this error?

  2. You could share you complete VASP output directory for the bad job here (except the POTCAR which you should not post). There’s not quite enough information to diagnose.

Best,

Matt

Hey Matt,

I tried a structure with a lower symmetry at the suggestion of one of my colleagues and the ZBRENT error disappeared. However, the job still fails. Honestly, I don’t know how to inspect Custodian attempts. Which vasp output files document Custodian attempts?

Here is a link to a drive with the vasp output directory. vasp outputs - Google Drive

Thanks,

Devon