How to test/develop a parser?

I am stuck at getting any local Nomad parser running.

My programme: I have got simulation data in binary format, and want to import it into a local Nomad Oasis. I already have Python code that reads the output files, and can process the data + metadata, with the perspective of integration into the target ontology.

Now I am trying to integrate this into a Nomad parser plugin. And I’m stuck at the first step, which is to get any kind of parser running. I have tried to make sense of the documentation on how to write a parser, but unfortunately to no avail so far.

Not even with any of my own code, the suggested instructions alone already failed. (I have installed Nomad into a local pyenv.):

  1. I executed the cruft template.
  2. Installation of the plugin package via pip install -e . like described here. The directory /path/to/plugin/.pyenv/lib/python3.10/site-packages/nomad_mweparser-0.1.0.dist-info exists.
  3. (I have not modified the contents of the plugin, i. e. per MyParserEntryPoint(), the mainfile regex should match files called '.*\.myparser')
  4. Create an empty file called test.myparser
  5. Run nomad parse test.myparser --show-archive

Always the same result:

$ nomad parse /path/to/plugin/test.myparser
Traceback (most recent call last):
  File "/path/to/plugin/.pyenv/bin/nomad", line 8, in <module>
    sys.exit(run_cli())
  File "/path/to/plugin/.pyenv/lib/python3.10/site-packages/nomad/cli/cli.py", line 71, in run_cli
    return cli(obj=POPO())  # pylint: disable=E1120,E1123
  File "/path/to/plugin/.pyenv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/path/to/plugin/.pyenv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/path/to/plugin/.pyenv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/path/to/plugin/.pyenv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/path/to/plugin/.pyenv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/path/to/plugin/.pyenv/lib/python3.10/site-packages/nomad/cli/parse.py", line 51, in _parse
    entry_archives = parse(mainfile, **kwargs)
  File "/path/to/plugin/.pyenv/lib/python3.10/site-packages/nomad/client/processing.py", line 56, in parse
    assert parser is not None, 'there is no parser matching %s' % mainfile
AssertionError: there is no parser matching test.myparser

No variation of prefixing with PYTHONPATH=. and moving through the directory structure of the parser plugin did change this.

I do not see any indication that local Nomad CLI recognizes Python modules that are installed in the same pyenv.

This is currently blocking me from establishing a development feedback cycle for the parser code. This blocks me from working on the actual scientific part.

How can I add a local parser into the plugin subsystem of Nomad CLI?

Hi @Meax5qiu!

Thanks for getting in contact. Let’s try to find a solution to your problem. Are you sure that the parser is registered correctly in pyproject.toml? Something like:

[project.entry-points.'nomad.plugin']
myparser = "nomad_mweparser.parsers:myparser"

If you modify this file, you will also need to rerun pip install -e ,

If possible, could you share the whole git repository for your plugin? This would allow us to reproduce the problem easily.

Thank you. I decided that a MWE is necessary in this case. I started from scratch (cruft, pip install) and packaged this into a public repo:

From here, the parser is now accepting mainfiles based on file extension, just as advertised. I don’t understand why it has not worked before, but my further work will base on small steps starting from here.

As I have outlined in the root README of the repository and the parser code, progress on parser implementation is now stuck at the point where the EntryArchive is to be populated.

Hence the logical follow-up question: Where does the parser know from, which schema to apply? How to load the schema code so that it can be used?

The text file parser how-to is currently incomplete, in the sense that it does not tell which steps have to be taken to tell the parser where to find the schema plugin (the plugin contents have been prepared alongside in that section of the how-to).

Thanks a lot for your assistance. I have noticed that the quality of the NOMAD documentation improved substantially in the last months; the more baffled I am that I’m still stuck at these very basic hurdles.

Hi @Meax5qiu !

Hence the logical follow-up question: Where does the parser know from, which schema to apply? How to load the schema code so that it can be used?

I am coming here just to quickly answer this question. Tomorrow, Wednesday 10.07.2024, we will have a Tutorial explaining precisely this:

We will show how to use the simulations schema, import it in the parser, extend it, etc.

I can only but suggest to join, and if something is still unclear, join the NOMAD Discord and keep asking us there. We have dedicated channels for simulations in NOMAD :slightly_smiling_face:

Oh great! I’ll drop by tomorrow for sure.

Many thanks for the tutorial! On a fundamental level, it helped me a lot in my understanding of the current state and scope of NOMAD. But also practically, I now have seen how a parser retrieves a schema and populates it with data. The necessary steps are in this changeset to my MWE repo.

This very minimal example works now, but this is of course only the first step, with quite some more to come. Expect me to show up on Discord :wink:

Anyway, in my experience it is hard to find complete but simple resources that show off how to get a parser on rails, in particular with a schema developed in parallel. The tutorials and examples that I have found, leave me with the impresson that they either

  • simplistic, to the point where they wouldn’t showcase any but the most trivial setup (e. g. the nomad-parser-plugin-example repo)
  • prone to miss out on some context (e. g. the in the how-to section of the documentation or the tutorials where code snippets are taken from files, but it is not clear how the complete files look like. I understand that you don’t want to always print whole files, but it would still be great if one could see the context for some piece of code in one piece.)
  • full complexity of released parsers, but you are on your own to find out if they are using a programming technique that might be useful for you too.

More and non-trivial (but still compact) MWEs would be a great addition to lower the entry barrier into parser development (or showcase the impact of architectural changes, like schema updates or the conversion from run to data).

Firstly, we’re glad to hear that you liked the tutorial and that it helped you situate file selection better. Running over some of the points you raised:

[…] it is hard to find complete but simple resources that show off how to get a parser on rails, in particular with a schema developed in parallel.

Thank you for the feedback there. We think that you will approve of this tutorial’s docs then. It aims to strike a balance between too simple or too complex. If you feel that it could still be further improved, feel free to reach out to us. We’re looking forward to hearing from you on Discord (channel fairmat-tutorial-14) :slight_smile:

simplistic, to the point where they wouldn’t showcase any but the most trivial setup (e. g. the nomad-parser-plugin-example)

We will coordinate with the other Areas that manage the documentation you mentioned to clarify which users / degree of sophistication they target.

full complexity of released parsers […]

Note that the majority of the current parsers still target the (soon-to-be) deprecated run schema. As the new schema settles in, we will start releasing updated parsers, that hopefully will be more accessible.

The necessary steps are in this changeset to my MWE repo.

Glad to see that yesterday helped you progress! Regarding the link to your MWE, is it working now? Do you need further help / feedback, or are you sharing it as a suggestion for us to include in our own docs?

Best,
Area C