Python schema with subsections

fabian_li · July 6, 2023, 4:46pm

Hi.
I would like a very simple python schema, that creates a subsection called subsection1 within the data section, and within this subsection1 creates an integer quantity named test, and populate test with the number “5”.
I can’t figure this out because I’m confused about the differences between MSection, EntryData, ArchiveSection, EntryArchive etc. and between section=, section_def=, extends_base_section= etc.
It would be great if your python schema was complete, ie. including which libraries to import and including all necessary instantiations.

mscheidgen · July 7, 2023, 7:31am

Here is a working example schema and an example how to instantiate.

from nomad.metainfo import Quantity, Package, MSection, SubSection
from nomad.datamodel.data import EntryData

# some boilerplate to properly initially the schema package
m_package = Package()

# The class that defines the section that you want to use for the 
# sub section. MSection is the mandatory base class for all section definitions
class SubSection1(MSection):
    test = Quantity(type=int)

# The class that defines what you want to put into data. The data
# sub section defined in EntryArchive has to be an EntryData.
# Therefore, we base this on EntryData. EntryData extends MSection btw.
class ExampleSection(EntryData):
    # When you define a sub section, the section argument says what section
    # definition to use for the sub section. 
    # type is for Quantity, what section is for SubSection)
    subsection1 = SubSection(section=SubSection1)

m_package.__init_metainfo__()

import json
from nomad.datamodel import EntryArchive

archive = EntryArchive(
    data=ExampleSection(
        subsection1=SubSection1(
            test=5
        )
    )
)
print(json.dumps(archive.m_to_dict(), indent=4))

This should produce this output

{
    "data": {
        "m_def": "__main__.ExampleSection",
        "subsection1": {
            "test": 5
        }
    }
}

MSection: all python classes that are section definitions must inherit from this class directly or indirectly
EntryData: the section definition that the data sub-section uses. Everything that you want to put into data has to inherit from EntryData
ArchiveSection: inherit from this section definition, when you want to use normalize functions. It makes sense to let all sections that you want to put into an archive inherit from ArchiveSection.
EntryArchive: The top-level section that is used for each entries archive. This is the section that defines the first level of sub-section: data, results, metadata, run, etc.
section, section_def, (section_definitions, sub_section): These are all aliases for the property of SubSection that allows you to specify the section definition to be used.

fabian_li · July 9, 2023, 4:27pm

Thank you very much for your reply.
I’d like to test the python schema you provided above with the nomad parse command shown in tutorial 9. For this, I’ve added a normalizer function:

from nomad.metainfo import Quantity, Package, MSection, SubSection
from nomad.datamodel.data import EntryData

m_package = Package()

class SubSection1(MSection):
    test = Quantity(type=int)

class ExampleSection(EntryData):    
    subsection1 = SubSection(section=SubSection1)
    def normalize(self, archive, logger):
        super().normalize(archive, logger)
        SubSection1.test = 5    # This line is meant to populate "test" with the number 5, but   doesn't seem to do anything     

m_package.__init_metainfo__()


# These following 4 lines I've commented out, because importing EntryArchive causes "circular import" errors while using the "nomad parse" command
# import json        		
# from nomad.datamodel import EntryArchive

# archive = EntryArchive(data=ExampleSection(subsection1=SubSection1(test=5)))
# print(json.dumps(archive.m_to_dict(), indent=4))

When I run nomad parse tests/data/test.archive.yaml --show-archive, there’s no error, but the desired schema of the data section (section data, subsection called SubSection1, quantity called test, populated with the number 5), doesn’t show. My test.archive.yaml file looks like this:

data:
    m_def: "nomadschemaexample.ExampleSection"

What do I have to change to get the desired schema of the data section with the number 5, using the nomad parse command? I’d like to have the number 5 hardcoded not in the test.archive.yaml file, but in the python script because I’m ultimately working on python schemas.
(As a side note, I’ve imported ArchiveSection and inherited from it in class ExampleSection because I’m using a normalizer, but the desired schema still doens’t show).

mscheidgen · July 10, 2023, 9:38am

Can you share your test.archive.yaml?

You are setting the value to the class not the object. Try to use this implementation:

def normalize(self, archive, logger):
    super().normalize(archive, logger)
    self.subsection1.test = 5

fabian_li · July 10, 2023, 1:46pm

My test.archive.yaml is:

data:
    m_def: "nomadschemaexample.ExampleSection"

If I replace my normalizer function with this:

def normalize(self, archive, logger):
    super().normalize(archive, logger)
    self.subsection1.test = 5

I get the error “could not normalize section (normalizer=MetainfoNormalizer, section=ExampleSection, exc_info=‘NoneType’ object has no attribute ‘test’)”

mscheidgen · July 10, 2023, 2:02pm

Yes, you data does not a subsection1 instance. Try this:

data:
    m_def: "nomadschemaexample.ExampleSection"
    subsection1:
        test: 0

In a real use-case the normalize function needs to guard for such conditions. For this example, you could also but an adapted version of the function into SubSection1. If the object does not exist, the normalize function will also not be called.

fabian_li · July 12, 2023, 12:53am

Thanks for the answer. I would like my python schema to go further, so that it fulfills 5 tasks:

Task 1: Populate results.method.simulation.program_name with “test”

T2: Reading from uploaded files
Now I upload some simulation data to NOMAD. Assume that amongst the uploaded simulation data there are 2 files: output.txt and positions.dat. Further assume this output.txt contains just the number 5. Read/open this file and store its content in run.calculation.energy.highest_occupied. Please don’t use anything like ELNComponentEnum.FileEditQuantity, because I think that’s only feasible if you have very few files to upload, but my simulation program can produce like 100 output files.

T3: Using in-Built Parser and plotting
Assume that positions.dat looks like this:

1 2
3 4
5 6

First column would be x-axis values, 2nd column y-axis, the columns are seperated by tabs. Read/open positions.dat (ideally using Nomad’s in-built tabularParser), store its content in run.systems.atoms.positions, plot it there as well as on the overview tab (I don’t mean the couloured 3D Plot of atoms in a unit cell but just a simple 2D line-plot).

T4: Creating new subsection in run
In section run.calculation, create a new subsection called quenching, in there create an integer quantity called triplet-triplet and populate it with the number 5. If this task is impossible, ignore it.

T5: Problem with test.archive.yaml
Now I want to add this python schema plugin to our OASIS. From the documentation:

You simply have to add the plugin metadata to your Oasis’ nomad.yaml and mount your code into the /app/plugins directory via the volumes section of the app and worker services in your docker-compose.yaml.

So in other words, I need to send my OASIS admin 2 files: the plugin metadata (called nomad_plugin.yaml in the documentation Documentation), and the schema code. So the file test.archive.yaml, which is being used to run nomad parse tests/data/test.archive.yaml , and which looks something like this:

data:
    m_def: "nomadschemaexample.ExampleSection"

I do not have to send it to the admin? Which means test.archive.yaml is not needed to run my python schema plugin in our OASIS? If this is true, then I would kindly ask you to write this entire python schema in such a way that it does not need test.archive.yaml to function.

All in all:
Template for the schema:

from nomad.metainfo import Quantity, Package, MSection, SubSection
import numpy as np

m_package = Package()

class ExampleSection():
    def normalize(self, archive, logger):
        super().normalize(archive, logger)

        # Task 1
        simulation = archive.m_setdefault("results.method.simulation")
        simulation.program_name = "test"

m_package.__init_metainfo__()

Any help with T2 to T5 would be greatly appreciated. Once this works in our OASIS, I can show it to our groupmembers who can then write their own python schemas based on this, and I won’t have to bother you guys so much with my questions . Also pls apologize for this giant wall of text.