Error in running atomate in offline Mode?

Hi Hang

I agree the immediate problem is likely your vasp_cmd (from your std_error.txt as well as check vasp.out). I am not sure how to run on your specific computing center and ensure that these environment variables are available on your compute node.

To get this working in the short term, my next step would likely to do some print statements after this code:

atomate/vasp/firetasks/run_calc.py:116

if isinstance(vasp_cmd, six.string_types):
    vasp_cmd = os.path.expandvars(vasp_cmd)
    vasp_cmd = shlex.split(vasp_cmd)


To see what vasp_cmd is actually being passed in to custodian. It's possible you will want to expand the environment variables prior to putting them in custodian.



However, a more pressing concern is what happens even after you get this running. Atomate is not really designed well for offline mode, and you will have problems with database insertion of results immediately after the VASP job finishes. This is discussed in some of the previous messages on this help list, see for example:

https://groups.google.com/forum/#!searchin/atomate/offline%7Csort:date/atomate/dggbBsK628Q/2653lXFGAwAJ


···

On Tuesday, February 5, 2019 at 12:40:24 PM UTC-8, Hang Xiao wrote:

Dear all,

I would like to run atomate in offline mode on a cluster with LSF job managing system. I have attached all the relevant files of running a test job in a zip file.
The error I got in “Si-structure_optimiz-41419.error” is :

/home1/xiaoh/miniconda3/lib/python3.7/site-packages/pymatgen/io/cif.py:44: UserWarning: Please install optional dependency pybtex if youwant to extract references from CIF files.

warnings.warn(“Please install optional dependency pybtex if you”

Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 320, in run

self._run_job(job_n, job)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 428, in _run_job

raise CustodianError(s, True, v)

custodian.custodian.CustodianError: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/fireworks/core/rocket.py”, line 262, in run

m_action = t.run_task(my_spec)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/atomate/vasp/firetasks/run_calc.py”, line 204, in run_task

c.run()

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 330, in run

.format(self.total_errors, ex))

RuntimeError: 0 errors reached: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’). Exited…

The error I got in “std_err.txt” is:

[mpiexec@n0114] HYDU_parse_hostfile (…/…/utils/args/args.c:535): unable to open host file: $CURDIR/nodelist.41419

[mpiexec@n0114] machine_file_fn (…/…/ui/mpich/utils.c:509): error parsing machine file

[mpiexec@n0114] match_arg (…/…/utils/args/args.c:243): match handler returned error

[mpiexec@n0114] HYDU_parse_array (…/…/utils/args/args.c:269): argument matching returned error

[mpiexec@n0114] parse_args (…/…/ui/mpich/utils.c:4009): error parsing input array

[mpiexec@n0114] HYD_uii_mpx_get_parameters (…/…/ui/mpich/utils.c:4339): unable to parse user arguments

My impression is that atomate didn’t run vasp properly with lsf system. Note that I have modified the LoadSharingFacility_template.txt file, which is included in the zip file attached.

I also included my_fworker.yaml in the offline folder in the zip file attached. I am kind of confused how to define vasp_cmd in my_fworker.yaml.

I attached the lsf script I often used to submit vasp jobs in this cluster as reference.

If I define vasp_cmd as “mpirun -np 16 vasp”, the jobs will not be conducted in the computing nodes. If I define vasp_cmd as “mpirun -np $NPROCS -machinefile $CURDIR/nodelist.$LSB_JOBID /home1/xiaoh/vasp/vasp.5.3-vtst/vasp > log”, then the variables NPROCS, CURDIR and LSB_JOBID are not defined, which leads to the error in std_err.txt shown above.

Could you guys guide me to run atomate in offline mode on a cluster with LSF job managing system?

Thank you very much!

Hang Xiao

Postdoc

Columbia Univeristy