custodian.custodian.ValidationError in Installing atomate

kamatani0164 · June 10, 2019, 7:40am

Hello.
I’m Japanese student. If there is something wrong in my question , please let me know that.
I’m trying “Installing atomate”, and finished “Submit the workflow” . However, the error in “FW_job-***.error” occurred like:

Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x150f6451eb70>
Traceback (most recent call last):
File “/home/kamatani/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fireworks/core/rocket.py”, line 262, in run
m_action = t.run_task(my_spec)
File “/home/kamatani/.pyenv/versions/3.7.3/lib/python3.7/site-packages/atomate/vasp/firetasks/run_calc.py”, line 205, in run_task
c.run()
File “/home/kamatani/.pyenv/versions/3.7.3/lib/python3.7/site-packages/custodian/custodian.py”, line 328, in run
self._run_job(job_n, job)
File “/home/kamatani/.pyenv/versions/3.7.3/lib/python3.7/site-packages/custodian/custodian.py”, line 452, in _run_job
raise ValidationError(s, True, v)
custodian.custodian.ValidationError: Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x150f6451eb70>

In “Add a workflow”, I used the command that atwf add -l vasp -s optimize_only.yaml -m mp-149 -c ‘{“vasp_cmd”: “sbatch /home/kamatani/atomate/config/slurm_mpi_job.sh”, “db_file”: “/home/kamatani/atomate/config/db.json”}’

I don’t know what the cause was.
please help me.

bocklund · June 10, 2019, 7:54am

Thank you for reposting this here.

Usually, an error like yours (copied below) shows up when VASP does not run correctly.

custodian.custodian.ValidationError: Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x150f6451eb70>

``

It looks like you are setting your “vasp_cmd” tp

“sbatch /home/kamatani/atomate/config/slurm_mpi_job.sh”

``

sbatch typically is used to submit batch files to the queue, while the “vasp_cmd” should be the command to actually run VASP, usually something like “mpirun vasp_std”. If you are wanting to run atomate and VASP through a job scheduler like SLURM, you should set up the “my_qadapter.yaml” file (as discussed here: https://atomate.org/installation.html#my-qadapter-yaml). Once this is set up, you can add your workflow with the proper VASP command, then you can submit the workflow to the scheduler by running the “qlaunch singleshot” command

kamatani0164 · June 10, 2019, 8:20am

Thank you for the reply.

However, I think I finish setting up the “my_qadapter.yaml” file, for I made it like attached file few days ago, but now I cannot do it.

“my_qadapter.yaml” file is

_fw_name : CommonAdapter

_fw_q_type : SLURM

rocket_launch : rlaunch -c /home/kamatani/atomate/config rapidfire

nodes : 2

walltime : 24:00:00

queue : null

account : null

job_name : null

pre_rocket : null

post_rocket : null

logdir : /home/kamatani/atomate/logs

bocklund · June 10, 2019, 8:34am

That looks like it should be a good start to configuring that file. Whether or not you need to set the queue or account will depend on what computer you are using. If those need to be set in typical batch scripts for your system (those submitted with sbatch, for example), they need to be set in the my_qadapter.yaml also. I would also set the job_name to a string, to be safe (I’m not sure if null works).

In the screenshot your sent, it looks like the code is still using the wrong vasp_cmd and it should be changed to something like {“vasp_cmd”: “mpirun vasp_std”}

Also, the screenshots can the copy and pasted code is quite hard to read. Could you try to not use screenshots and to more nicely format any copy and pasted code in the future?

kamatani0164 · June 10, 2019, 9:26am

I see.
I changed vasp_cmd ,and ran the “qlaunch singleshot” command , but the same error occurred.

I also set the job_name to a string. It still goes wrong.

(atomate_env) kamatani@valkyrie01 $ lpad reset [~/calc/vasp/mgo]

Are you sure? This will RESET 1 workflows and all data. (Y/N)Y

2019-06-10 17:48:59,453 INFO Performing db tune-up

2019-06-10 17:48:59,659 INFO LaunchPad was RESET.

(atomate_env) kamatani@valkyrie01 $ atwf add -l vasp -s optimize_only.yaml -m mp-149 -c ‘{“vasp_cmd”: “mpirun vasp_std”, “db_file”: “/home/kamatani/atomate/config/db.json”}’

2019-06-10 17:49:12,322 INFO Added a workflow. id_map: {-1: 1}

(atomate_env) kamatani@valkyrie01 $ qlaunch singleshot [~/calc/vasp/mgo]

2019-06-10 17:49:19,543 INFO moving to launch_dir /home/kamatani/calc/vasp/mgo

2019-06-10 17:49:19,544 INFO submitting queue script

2019-06-10 17:49:19,561 INFO Job submission was successful and job_id is 735

(atomate_env) kamatani@valkyrie01 $ lpad get_wflows [~/calc/vasp/mgo]

{

"state": "FIZZLED",

"name": "Si--1",

"created_on": "2019-06-10T08:49:12.293000",

"states_list": "F"

}

bocklund · June 10, 2019, 9:42am

Ok, great. You can see that the Firework fizzled. Now we want to investigate why that happened.

There are two places to start looking, both can be inspected by starting with the Firework entry in your launchpad.

The traceback for the Firework

You can see from the output of lpad get_wflows that the ID of the Firework that fizzled is “1”. We can inspect this Firework with some more details by running

lpad get_fws -i 1 -d more

``

where the “-i 1” means to get the Firework with the id of 1 and the “-d more” means that we will get more details (there is also a “-d all”, but more is enough for this case).

If you look at the output, you should see that there is a list of “launches” dictionaries. You’ll want to look at what the “traceback” key says. This is the actual Python error that caused it to fizzle.

The local directory where the job ran.

Looking at the same output, you can see there’s also a “launch_dir” key. This is the place on your filesystem that the job actually ran. You can change to that directory and try to understand what happened as well.

Feel free to post the output of lpad get_fws -i 1 -d more if you need help interpreting the output.

kamatani0164 · June 10, 2019, 9:54am

I appreciate it.
the output is

(atomate_env) kamatani@valkyrie01 $ lpad get_fws -i 1 -d more [~/calc/vasp/mgo]

{

"fw_id": 1,

"created_on": "2019-06-10T08:49:12.292858",

"updated_on": "2019-06-10T08:49:35.686104",

"launches": [

    {

        "fworker": {

            "name": "kamatani",

            "category": "",

            "query": "{}",

            "env": {

                "db_file": "/home/kamatani/atomate/config/db.json",

                "vasp_cmd": "sbatch /home/kamatani/atomate/config/slurm_mpi_job.sh",

                "scratch_dir": null

            }

        },

        "fw_id": 1,

        "launch_dir": "/home/kamatani/calc/vasp/mgo/launcher_2019-06-10-08-49-25-373963",

        "host": "valkyrie16",

        "ip": "192.168.114.95",

        "trackers": [],

        "action": {

            "stored_data": {

                "_message": "runtime error during task",

                "_task": {

                    "vasp_cmd": "mpirun vasp_std",

                    "job_type": "double_relaxation_run",

                    "max_force_threshold": 0.25,

                    "ediffg": null,

                    "auto_npar": ">>auto_npar<<",

                    "half_kpts_first_relax": false,

                    "_fw_name": "{{atomate.vasp.firetasks.run_calc.RunVaspCustodian}}"

                },

                "_exception": {

                    "_stacktrace": "Traceback (most recent call last):\n  File \"/home/kamatani/.pyenv/versions/3.7.3/lib/python3.7/site-packages/fireworks/core/rocket.py\", line 262, in run\n    m_action = t.run_task(my_spec)\n  File \"/home/kamatani/.pyenv/versions/3.7.3/lib/python3.7/site-packages/atomate/vasp/firetasks/run_calc.py\", line 205, in run_task\n    c.run()\n  File \"/home/kamatani/.pyenv/versions/3.7.3/lib/python3.7/site-packages/custodian/custodian.py\", line 328, in run\n    self._run_job(job_n, job)\n  File \"/home/kamatani/.pyenv/versions/3.7.3/lib/python3.7/site-packages/custodian/custodian.py\", line 452, in _run_job\n    raise ValidationError(s, True, v)\ncustodian.custodian.ValidationError: Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x1540dfedcd30>\n",

                    "_details": null

                }

            },

            "exit": true,

            "update_spec": {},

            "mod_spec": [],

            "additions": [],

            "detours": [],

            "defuse_children": false,

            "defuse_workflow": false

        },

        "state": "FIZZLED",

        "state_history": [

            {

                "state": "RUNNING",

                "created_on": "2019-06-10T08:49:25.420923",

                "updated_on": "2019-06-10T08:49:35.670536",

                "checkpoint": {

                    "_task_n": 1,

                    "_all_stored_data": {},

                    "_all_update_spec": {},

                    "_all_mod_spec": []

                }

            },

            {

                "state": "FIZZLED",

                "created_on": "2019-06-10T08:49:35.674017",

                "checkpoint": {

                    "_task_n": 1,

                    "_all_stored_data": {},

                    "_all_update_spec": {},

                    "_all_mod_spec": []

                }

            }

        ],

        "launch_id": 1

    }

],

"state": "FIZZLED",

"name": "Si-structure optimization"

}

bocklund · June 10, 2019, 9:57am

This is more or less what I expected to see - you’re still getting the Vasprun Validator error. What happens if you go to the launch directory and run “mpirun vasp_std” by hand?

kamatani0164 · June 10, 2019, 10:05am

Well, like this.

(atomate_env) kamatani@valkyrie01 $ mpirun vasp_std [~/calc/vasp/mgo]

···

mpirun was unable to find the specified executable file, and therefore

did not launch the job. This error was first reported for process

rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command

  line parameter option (remember that mpirun interprets the first

  unrecognized command line token as the executable).

Node: valkyrie01

Executable: vasp_std

16 total processes failed to start

(atomate_env) kamatani@valkyrie01 $ cd launcher_2019-06-10-08-49-25-373963 [~/calc/vasp/mgo]

(atomate_env) kamatani@valkyrie01 $ mpirun vasp_std [~/calc/vasp/mgo/launcher_2019-06-10-08-49-25-373963]