Help needed with Wien2k parser hacking

Dear all,

I’ve been playing a bit with the Wien2k parser, with mixed success.

My first question is how to treat variables which are printed every scf iteration but are mostly interesting just in the last, such as the total energy.

So far what I have done is to add helper variable:
self.totE = 0 (in initialize_values)
and I save the value after every scf iteration in onClose_section_scf_iteration:
and later add it to backend in onClose_section_single_configuration_calculation
backend.addValue(“energy_total”, self.totE)
however this feels like a lot of unnecessary copies and code.

For example the current Wien2k parser has code like this:

def onClose_section_system

# atom force
atom_force = []
for i in [‘x’, ‘y’, ‘z’]:
api = section[‘x_wien2k_for_’ + i]
if api is not None:
atom_force.append(api)

This doesn’t works since section is empty, as there is no ‘x_wien2k_for_x’ in section inside onClose_section_system (its accessible only in onClose_section_scf_iteration) so api is always None and the atom_force belongs to section_single_configuration_calculation anyway. However if one could do it like this it would simplify stuff somehow. So I wonder what the original intent of the author was? So is it possible to somehow access the variables parsed in different section in onClose of another section?
I feel this is somehow related to the caching concept which I don’t get at all, but maybe this is completely unrelated.

What I also don’t understand is how are the repeating variables treated: for example when parsing some repeating stuff (inside section_scf_iteration) using something similar to this example line:
SM(r"ATOM[0-9]+\s*(?P<x_wien2k_force>[-+0-9.]+)", repeats = True),
it will match several lines and when I do
print(section[‘x_wien2k_force’]) in onClose_section_scf_iteration
it will show correctly an array with all the values for all matched atoms. But later when looking at output of “nomad parse --show-archive” there is only the last value, the rest was lost somewhere along the way. How to make the array actually propagated into the archive?

Any help would be appreciated.
Best regards
Pavel

Hi Pavel, thanks for your contribution.

You are addressing several issues. First, I think the total energy in each scf iteration should be stored in energy_total_scf_iteration within each section_scf_iteration, while the resulting total energy (i.e. from the last scf iteration ?) should be stored in the 'section_single_configuration_calculation’s energy_total.

Secon, you are right this is in the wrong method and needs to move to the onClose_section_scf_iteration. I cannot tell you was the original authors intention was.

For the third issue, I am not sure if this is by design or a bug. A workaround would be to overwrite the wrong value in the archive. For example in the onClose_section_scf_iteration you could do something like backend.addValues('x_wien2k_force', section['x_wien2k_force']). Also be sure that the quantities (e.g. x_wien2k_force) are defined and have the proper shape (see wien2kparser/metainfo/wien2k.py). This is also important if you want to add new quantities.

You are right, this is all cumbersome and not really satisfying. You can also judge by my answer that I am not super familiar with how these parser and the used library actually works. Therefore, we are overhauling the overall parser design. I hope that we convert the wien2k parse soon to the new system. I think a lot of these issue will become much clearer then.

Otherwise, I can only refer to the documentation that we have.
For writing quantity definitions: https://nomad-lab.eu/prod/rae/docs/metainfo.html#module-nomad.metainfo
For the SM parser stuff: https://nomad-lab.eu/prod/rae/docs/dev/parser_tutorial.html#simple-matcher

Thanks again for any improvements to the parser.