Oasis: parsers are interfering with each other

fabian_li · October 24, 2023, 3:18pm

Hello.
On our Nomad Oasis we have multiple custom parsers installed. I have 2 problems:

For every test upload I make, the words SinglePoint simulation are being falsely added to the file entry’s name, and also to the workflow name. In almost none of the parser’s code does the word SinglePoint even get mentioned, in one parser (the apbs parser) SinglePoint gets mentioned but only in some code comment I think. Why does Nomad keep adding SinglePoint to every upload? How do I stop this?
Two parsers, the battery-parser and graphene-parser, are interfering with each other. The interfering can be seen after I upload test files for the battery parser:
in run.calculation.concentrations, if you select any of the molecules, the subsection that describes those molecules should be called Concentrations_Battery, but it’s called Concentrations_Graphene instead, so somehow the battery parser is accessing the graphene parser. Also, if you click on the name of any of the molecules, and activate definitions in the NOMAD GUI, you can see the description that goes something like ..."e" means already attached to graphene edge, which is a description from the graphene parser.
Also, Nomad clearly shows in the Files tab of the GUI that the correct parser is being selected (battery for battery simulation uploads), which makes my problem so confusing to me.

Here is the link to our gitlab repo: Sign in · GitLab
If I have to grant you some access rights, tell me please.
For the battery parser, there are test files in the tests/data folder, mainfile is input_battery.yml. All parsers work perfectly when tested locally with the nomad parse command.

Best
Fabian

JosePizarro · October 24, 2023, 6:15pm

Hello @fabian_li,

We are so happy you are keeping working on NOMAD, even developing your own parsers. This is really an achievement

Regarding your questions:

In the computational side of NOMAD, we have the following convention for defining method_name and workflow_name. Both namings depend on defined sections, i.e., normalization takes care of setting method_name = 'dft' if there is a populated secion run.method.dft, or method_name = 'gw' if there exists run.method.gw (and so on). Something similar happens with workflow_name, but in that case, we distinguish a basic workflow from pre-defined workflows (for example, GeometryOptimization). The basic workflow is called SinglePoint and happens whenever you populate a single section run.calculation, independently whether you define workflow2 or not in your archive.
Then the naming SinglePoint simulation is defined in a quantity called entry_name. The logic is a bit involved but it can be summarized in: if workflow_name != 'SinglePoint' it takes the method_name, but if method_name does not exist, is still uses workflow_name. The name also includes the chemical formula and the program name. You can check out one of the latest entries in the central NOMAD archive to see more in detail what I meant.
If you want, you can share with us how you would like to name entries, and we can take a look more in depth and help improving the naming of entries, or at least, give flexibility enough for your two parsers.
Without taking a look into the custom parsers, it is a bit complicated to know what is going on. Maybe you can give me temporary access to the KIT Gitlab and I can take a look. But, a first idea I am having that might be causing troubles is how you identify the parsers that has to be executed, i.e., how you run the MatchingParser. Did you define some sort of identifier in input_battery.yml to pass one or another parser? Again, this might not be the source of the issue, so I would have to take a look on the repo; let me know which info you need to invite me.

All the best,
Jose

fabian_li · October 25, 2023, 8:59am

Thanks for quick reply. @JosePizarro Please send me your gitlab username or gitlab email-adress, so I can invite you.

JosePizarro · November 1, 2023, 5:01pm

Hi @mscheidgen,

I helped @fabian_li with the error 2., and found out something interesting that I am not sure why it is not being resolved in the front-end.

@fabian_li here has the two parsers plugins, graphene_parser and battery_parser, and in both of them, there are two MSections which share the same path in the archive. These are named differently in the metainfo.py files of each parser, Concentrations_Graphene and Concentrations_Battery. They share the same path as they are added as extensions of Calculation like:

class GrapheneCalculation(simulation.calculation.Calculation):
    m_def = Section(extends_base_section=True)    
    
    ...
    concentrations = SubSection(sub_section=Concentrations_Graphene.m_def, repeats=True)
    ...

and:

class BatteryCalculation(simulation.calculation.Calculation):
    m_def = Section(extends_base_section=True)    
    
    ...
    concentrations = SubSection(sub_section=Concentrations_Battery.m_def, repeats=True)
    ...

Thus, the problem is that both sections share the same path, archive.run.0.calculation.0.concentrations, and my guess is that the front-end is having issues properly getting the m_def of the section; in the back-end everything works well. Is this something expected or could be somehow fixed?

@fabian_li and I talked and agree that the best practice should be to define a single MSection Concentrations as a schema plugin and then use it for both parsers. As a side note, this can also be fixed if the sub-sections are renamed to concentrations_graphene and concentrations_battery, but I will go with the schema plugin option.

Thanks!