Band structure workflow

Hi, I’m new to atomate and I was trying to calculate Si band structure following the steps in https://atomate.org/running_workflows.html. I ran the workflows at NERSC cori. The workflow got fizzled in the first step, which is the structure optimization, leaving all following fireworks waiting. But it seems that the first step has completed, because OUTCAR looks good, and there are no error messages in test_job.error file. The only thing is that in test_job.out file, the last task seems to be not completed:

2020-11-10 14:37:35,762 INFO Hostname/IP lookup (this will take a few seconds)
2020-11-10 14:37:35,785 INFO Launching Rocket
2020-11-10 14:38:12,658 INFO RUNNING fw_id: 4 in directory: /global/u1/w/wygong/wf/test/Si
2020-11-10 14:38:12,720 INFO Task started: FileWriteTask.
2020-11-10 14:38:12,731 INFO Task completed: FileWriteTask
2020-11-10 14:38:12,742 INFO Task started: {{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}.
2020-11-10 14:38:13,355 INFO Task completed: {{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}
2020-11-10 14:38:13,367 INFO Task started: {{atomate.vasp.firetasks.run_calc.RunVaspCustodian}}.

I wonder if this could be the cause to fizzled jobs. Thank you!

Hi,

Those logging messages don’t contain any errors. Is that the end of the messages or are there more after that?

If that is the last message, that would suggest that your calculation did not finish and the wall time for the job ran out. You can check if VASP finished running by looking for the timing information at the bottom of the OUTCAR. If VASP didn’t finish running then you should try increasing the wall time.

If the calculation did finish then there should be some other error messages elsewhere, can you look in the webgui at the firework that failed to see if there are any further errors?

Best,
Alex

Hi Alex,

Thank you for your reply. Below are all messages generated during the calculation:

test_job.error: empty

test_job.out:
2020-11-13 13:55:51,521 INFO Hostname/IP lookup (this will take a few seconds)
2020-11-13 13:55:51,539 INFO Launching Rocket
2020-11-13 13:56:32,007 INFO RUNNING fw_id: 1 in directory: /global/u1/w/wygong/wf/test/Si_static
2020-11-13 13:56:32,033 INFO Task started: FileWriteTask.
2020-11-13 13:56:32,043 INFO Task completed: FileWriteTask
2020-11-13 13:56:32,047 INFO Task started: FileWriteTask.
2020-11-13 13:56:32,052 INFO Task completed: FileWriteTask
2020-11-13 13:56:32,056 INFO Task started: {{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}.
2020-11-13 13:56:32,333 INFO Task completed: {{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}
2020-11-13 13:56:32,337 INFO Task started: {{atomate.vasp.firetasks.run_calc.RunVaspCustodian}}.

std_err.txt: empty

launchpad-debug.log:
2020-11-13 12:30:12,476 DEBUG RESTARTED fw_id, launch_id to (1, 1)
2020-11-13 12:30:12,479 INFO Performing db tune-up
2020-11-13 12:30:12,479 DEBUG Updating indices…
2020-11-13 12:30:12,509 INFO LaunchPad was RESET.
2020-11-13 12:30:33,919 INFO Added a workflow. id_map: {-1: 1}
2020-11-13 12:30:48,267 DEBUG FW with id: 1 is unique!
2020-11-13 13:56:31,976 DEBUG FW with id: 1 is unique!
2020-11-13 13:56:31,987 DEBUG Created/updated Launch with launch_id: 1
2020-11-13 13:56:32,006 DEBUG RUNNING FW with id: 1

launchpad-error.log: empty

queue_launcher-error.log: empty

rocket_launcher-debug.log:
2020-11-13 13:55:51,539 INFO Launching Rocket
2020-11-13 13:56:32,007 INFO RUNNING fw_id: 1 in directory: /global/u1/w/wygong/wf/test/Si_static
2020-11-13 13:56:32,033 INFO Task started: FileWriteTask.
2020-11-13 13:56:32,043 INFO Task completed: FileWriteTask
2020-11-13 13:56:32,047 INFO Task started: FileWriteTask.
2020-11-13 13:56:32,052 INFO Task completed: FileWriteTask
2020-11-13 13:56:32,056 INFO Task started: {{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}.
2020-11-13 13:56:32,333 INFO Task completed: {{atomate.vasp.firetasks.write_inputs.WriteVaspFromIOSet}}
2020-11-13 13:56:32,337 INFO Task started: {{atomate.vasp.firetasks.run_calc.RunVaspCustodian}}.
2020-11-13 13:57:34,315 INFO Task completed: {{atomate.vasp.firetasks.run_calc.RunVaspCustodian}}
2020-11-13 13:57:34,320 INFO Task started: {{atomate.common.firetasks.glue_tasks.PassCalcLocs}}.
2020-11-13 13:57:34,320 INFO Task completed: {{atomate.common.firetasks.glue_tasks.PassCalcLocs}}
2020-11-13 13:57:34,325 INFO Task started: {{atomate.vasp.firetasks.parse_outputs.VaspToDb}}.
2020-11-13 13:57:47,157 INFO Rocket finished

And I double checked OUTCAR, it indeed finished with timing information.

As you suggested, I typed “lpad webgui” command and I’m not sure where to find error information. What I got is as shown below:

Could it be due to the wrong config files? Below are my configs:

my_fworker.yaml:
name: NERSC_fworker
category: ‘’
query: ‘{}’
env:
db_file: /global/homes/w/wygong/wf/config/db.json
vasp_cmd: ‘srun -n 32 -c 2 --cpu_bind=cores vasp_std’
gamma_vasp_cmd: ‘srun -n 32 -c 2 --cpu_bind=cores vasp_gam’
scratch_dir: /global/cscratch1/sd/wygong/
incar_update:

my_launchpad.yaml:
host: mongodb07.NERSC.gov
port: 27017
name:
username:
password:
logdir: /global/homes/w/wygong/wf/logs
Istrm_lvl: DEBUG
user_indices: []
wf_user_indices: []

my_qadaptor.yaml:
_fw_name: CommonAdapter
_fw_q_type: SLURM
_fw_template_file: /global/homes/w/wygong/.conda/envs/workflow/config/SLURM_VASP_template
time: 00:30:00
partition: regular
nodes: 1
system: haswell
job_name: test_job
account: null
pre_rocket: module load vasp
rocket_launch: rlaunch -w /global/homes/w/wygong/.conda/envs/workflow/config/my_fworker.yaml singleshot
post_rocket: null

Thanks!
Weiyi