Not able to parse a calculation because vasprun.xml is not well formed

Rushik_Desai · February 16, 2024, 7:08pm

Hi,

I have been using atomate2 to parse some calculations. I am doing a high-throughput perovskite based study and needed all the data in one place and thus I was using atomate2. I am facing a “XML not well-formed” error while parsing some of this. When I tried to check what element was causing this problem it was a type element with “^@^@^@^@” in it as its child.

I can parse the calculation using VaspDrone when I remove this set of characters but the issue is that it doesn’t print out the bandstructure information after that. Can someone suggest a better way to tackle this than changing the vasprun.xml file? Also what is causing some of these calculations to have such characters?

Thanks,
Rushik

Aaron_Kaplan · February 16, 2024, 9:38pm

The “^@” is a null character, your vasprun.xml might have these written if something went wrong with the calculation or file writing step.

If you try to create a pymatgen.io.outputs.Vasprun object from your vasprun.xml, are you able to do so without errors/exceptions being thrown?

Also, does your workflow manager (jobflow, fireworks,…) indicate that the job was successfully run?

Rushik_Desai · February 16, 2024, 9:42pm

I don’t use a workflow manager but OUTCAR file shows nothing wrong. I will try the pymatgen thing.

Rushik_Desai · February 17, 2024, 4:20pm

Traceback (most recent call last):
  File "/scratch/negishi/desai224/HSE-PBEsol/trial.py", line 3, in <module>
    Vasprun('./vasprun_original.xml')
  File "/home/desai224/.conda/envs/2022.10-py39/atomate2/lib/python3.11/site-packages/pymatgen/io/vasp/outputs.py", line 287, in __init__
    self._parse(
  File "/home/desai224/.conda/envs/2022.10-py39/atomate2/lib/python3.11/site-packages/pymatgen/io/vasp/outputs.py", line 408, in _parse
    raise exc
  File "/home/desai224/.conda/envs/2022.10-py39/atomate2/lib/python3.11/site-packages/pymatgen/io/vasp/outputs.py", line 315, in _parse
    for _, elem in ET.iterparse(stream):
  File "/home/desai224/.conda/envs/2022.10-py39/atomate2/lib/python3.11/xml/etree/ElementTree.py", line 1249, in iterator
    yield from pullparser.read_events()
  File "/home/desai224/.conda/envs/2022.10-py39/atomate2/lib/python3.11/xml/etree/ElementTree.py", line 1320, in read_events
    raise event
  File "/home/desai224/.conda/envs/2022.10-py39/atomate2/lib/python3.11/xml/etree/ElementTree.py", line 1292, in feed
    self._parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 9613, column 12

This is what I get even with atomate2 VaspDrone and the pymatgen.io.vasp.Vasprun. When I remove this, the VaspDrone is able to parse the document except the bandstructure. The calculation is converged as per the OUTCAR.

Aaron_Kaplan · February 20, 2024, 4:52pm

It seems like your calculation failed at the final stage of writing files (happens occasionally). Also, some errors are only written to standard output by VASP and not to OUTCAR.

You probably need to rerun the calculation - if you saved the CHGCAR or WAVECAR and those were successfully written to disk, you can restart from either file to speed up the calculation.

Alternately, if the DOSCAR and EIGENVAL files are complete, you can get the DOS and bandstructure from these respectively