Initialization of custom parser seems to fail

Hello,
I am trying to create my first parser for parsing some custom CSV files (few lines of header including meta data, followed by measurement data in table form).
I started from this template: GitHub - nomad-coe/nomad-parser-plugin-example: This is an example for NOMAD parser plugins. It should be forked to create actual plugins. and basically just changed the name “example” to “ex” everywhere for now.
First tests with just using a simple text file worked fine using the described local testing procedure (export PYTHONPATH=., nomad parse tests/data/test.ex.txt --show-archive)

When changing the test file from a plain text to a CSV I changed:
nomad_plugin.yaml: mainfile_name_re: ^.*\.ex\.txt$ to mainfile_name_re: ^.*\.ex\.csv$
and
tests/data/test.ex.txt to tests/data/test.ex.csv (comma-separation in all lines).

In the parser.py I included this:

class ExParser(MatchingParser):
    def __init__(self):

        print('ExParser initialization')

        super().__init__(
            name='parsers/ex',
            mainfile_mime_re=r'application/csv')

    def parse(...):
        ...

I get the following error: AssertionError: there is no parser matching test.ex.csv

When inspecting the nomad library, I can see that the input file is tested against my custom parser (parsers/ex) however the match fails in the is_mainfile method of MatchingParser because self.mainfile_mime_re still evaluates to re.compile('text/.*') according to its default value while the input mime type is now correctly recognized as application/csv (using the previous text file or just including a single empty line in the beginning of the csv file changes the recognized mime type to text/plain).
It seems there is no object of my custom parser class ´ExParser´ instantiated actually applying the mainfile_mime_re=r'application/csv' assignment in the __init__ function for the MatchingParser class. What am I missing?

Thank you very much,
Lukas

1 Like

Ok, I actually got it to work by including
mainfile_mime_re: application/csv in the nomad_plugin.yaml

2 Likes

It seems that the mainfile_mime_re from the nomad_plugin.yaml overwrites whatever you have in your class. Not sure, if this should be a bug. We will look into this.

Just for some context. NOMAD uses libmagic to determine the mime-type of a file. Just matching for the mime-type is very broad. If you nomad uses multiple parsers that are interested in files of that type, parsers might shadow each other.