Uploading custom results using directly the *.archive.json files

Pavel_Ondracka · October 10, 2022, 11:39am

I have a question, it was mentioned that it is possible to upload manually the nomad metainfo, so that one doesn’t need a parser to upload some results. How does it work specifically, and are there some examples?

Just to have a specific simple case for illustration, consider that one does an EOS fit manually or with ase, i.e., there is a bunch of DFT (with some DFT code that already has a parser) calculations with different volumes and outfiles “outfile-volume-xx” and now I would like to create a *.archive.json file for the EOS workflow manually. How should it look like?

If I upload something like this, the archive parser is invoked but I end up with an empty entry, the workflow data is not detected (I would expect that the entry shows that this is EOS workflow and the EOS figure):

{
  "archive": {
    "run": [
      {
        "program": {
          "name": "Custom EOS fitting program",
          "version": "0.0.1"
        }
      }
    ],
    "workflow": [
      {
        "type": "equation_of_state",
        "equation_of_state": {
          "volumes": [
            2.590176255044027e-29,
            2.857923598843761e-29,
            2.9511096938602546e-29,
            2.7667204026517606e-29,
            3.046298974841937e-29,
            3.2427757312672296e-29,
            3.143514675276939e-29,
            3.344106299956417e-29,
            3.770528810636432e-29,
            3.882517693534008e-29,
            3.660714751228673e-29,
            2.677478885768968e-29,
            4.1131037855971765e-29,
            4.3526435740256103e-29,
            4.475824695172308e-29,
            3.447523997289499e-29,
            4.2317440927593925e-29,
            4.6013081082379197e-29,
            3.5530543183707506e-29,
            4.7291148140639506e-29,
            3.996702460217774e-29
          ],
          "energies": [
            -2.454369723290595e-19,
            -2.731979229153517e-19,
            -2.7949855983357933e-19,
            -2.6551520196951336e-19,
            -2.845467580536103e-19,
            -2.9133321938766833e-19,
            -2.8845392853258726e-19,
            -2.9327794699363573e-19,
            -2.9339910399124295e-19,
            -2.918733347603743e-19,
            -2.943678865289055e-19,
            -2.5631193488863316e-19,
            -2.873697548304791e-19,
            -2.81253344192512e-19,
            -2.777022122089516e-19,
            -2.9437851937413706e-19,
            -2.844879185172708e-19,
            -2.738697235888711e-19,
            -2.947165317802445e-19,
            -2.697911077931872e-19,
            -2.898459384449899e-19
          ],
          "eos_fit": [
            {
              "function_name": "mie_gruneisen",
              "fitted_energies": [
                -2.4538980481488733e-19,
                -2.732328212717778e-19,
                -2.79522031112045e-19,
                -2.655501158550913e-19,
                -2.845524431297313e-19,
                -2.9131167432361963e-19,
                -2.884457506442127e-19,
                -2.9324954409225927e-19,
                -2.9339281267625726e-19,
                -2.9187880848840325e-19,
                -2.94350554656235e-19,
                -2.5632447670213415e-19,
                -2.8739657380753024e-19,
                -2.812853006161564e-19,
                -2.7771947779445384e-19,
                -2.943490663664639e-19,
                -2.845234647497659e-19,
                -2.738600624494e-19,
                -2.946915428658402e-19,
                -2.6973805258604493e-19,
                -2.8986344187533134e-19
              ],
              "bulk_modulus": 192543470835.1147,
              "bulk_modulus_derivative": 6.244142129575867,
              "equilibrium_volume": 3.5513500180643385e-29,
              "equilibrium_energy": -2.9469163027943346e-19
            }
          ]
        }
      }
    ]
  }
}

mscheidgen · October 10, 2022, 3:10pm

You are right, there is an archive parser and you can use it. Your files have to be named *.archive.json or *.archive.yaml.

Keep in mind that the archive (e.g. ArchiveEntry) instance is the top-level object. In your examples, you must not have a key archive and put the data directly into the file. Like this:

{
  "run": [ { ... } ],
  "workflow": [ ... ]
}

You can also test your json with the nomad cmd: nomad parser yourfile.archive.json --show-archive.

A few notes about the archive parser:

The uploaded files can be json or yaml
Technically the uploaded file is still a mainfile and there is still an archive as the result of “parsing”. What you see on the data tab is the archive not your provided file, even though both would be mostly the same.
Stuff that does not match the schema will be ignored. In you example, this happened to the archive key and hence all your content. Leaving you with an “empty” entry.
DFT specific normalisers are not applied. Here it could be discussed, if this should be changed.
If you add a key metadata it will be ignored even though there is a metadata sub section in the schema. This special section is supposed to be determined by the system not by uploaders.

Pavel_Ondracka · October 11, 2022, 6:34am

Thanks for the clarification, without the archive toplevel entry it works like a charm. Just few more questions:

what happens when the metainfo scheme changes in the future and a mass reprocessing happens. Will the entries potentially get blank (and non-searchable) again?
In what part of the metainfo am I supposed to link the underlying DFT calculations (i.e., so that someone looking at the EOS can easily find the individual DFT calculations corresponding to the points)?

mscheidgen · October 17, 2022, 5:29am

The answers to both questions are still under lots of discussions on our side.

We think of the archive as two parts. One is schema flexible and might undergo frequent changes. This is where domain specific data goes. From the theory side it is mostly the sections run. We are establishing a schema versioning mechanism that allows for different versions to coexist at the same side. I.e. none parsed data is not subjected to re-processing and we will keep the old version. The other part of the archive will be more fixed and not undergo frequent changes. This is the part with sections results, workflow, metadata, etc. If we need changes here, data migration for non parsed data will be necessary.
Typically that would be the section workflow. But, we are still establishing the right structure here so I can’t really give a good answer yet.

Pavel_Ondracka · December 13, 2022, 1:58pm

I was playing with the archive parser a bit more and I can for example create a archive.json with elastic properties results from some custom calculator. Here is one example
elastic.archive.json (5.6 KB). The nomad parses it fine and the elastic properties are shown on the entry page. What I can’t get right is actually the structure. No structure is shown on the entry page and the entry seems to be not searchable by the composition. I’m not sure if I messed up and the system section in the json is somehow incorrect, or if it is missing some normalizer step that is maybe not run for the archive parser? What I need to include in the archive.json file so that the NOMAD can show the structure on the entry page and it is searchable?

mscheidgen · December 14, 2022, 10:24am

I debugged this a bit. The system normalizer that determines the representative system, normalizes it, and makes it ready for the results normalizer to put it into the results section is only run if the parser domain matches the normalizer domain. Currently, the system normalizer is only run for our electronic structure code, md, etc. parsers, but no the archive parser. The normalizer selection is also inconsistent between using the nomad parse cli and the actual processing.

I think we need to clean this mechanism a bit and find a work arround for your use-case: Allow to determine the used normalizers within files that use the archive parser. (#1235) · Issues · nomad-lab / nomad-FAIR · GitLab

Pavel_Ondracka · December 14, 2022, 11:14am

Thanks for taking a look.

ladinesa · February 13, 2023, 8:37am

This problem is also reported in Material card does display properly when using directly uploaded archive data (#1321) · Issues · nomad-lab / nomad-FAIR · GitLab. A quick fix was provided by retaining the domain name in the metadata.

Pavel_Ondracka · February 13, 2023, 9:02am

@ladinesa Thanks for the effort, unfortunatelly I don’t think it works for my case, at least not for example here https://nomad-lab.eu/prod/v1/staging/gui/search/entries/entry/id/JIsK87WX1MmVNI42yLcjmIhujlL4 it should contain the proper system info, but it is still not searchable.

ladinesa · February 13, 2023, 12:17pm

can you please set metadata.domain = dft to trigger the dft normalizers. A proper fix would really be to specify which normalizer to run. I will work on this shortly.

Pavel_Ondracka · February 13, 2023, 12:39pm

Great, that fixes it. Thanks.

Pavel_Ondracka · September 30, 2023, 9:20am

Revisiting this old topic, as it looks like some updates broke this again. I’m suspecting the workflow/workflow2 changes.

I’m attaching a custom elastic.archive.json (29.6 KB) , my aim is that after upload this will get recognized as an elastic workflow (so that anyone looking at the upload will see that there are some elastic simulations), similarly how it worked in the past (see for example this older entry: https://nomad-lab.eu/prod/v1/gui/search/entries/entry/id/bWwJ9CxDYG4aTv3kWTMxUrwN6cnU). But currently it is marked with workflow name “Workflow” (this is when I put the data into workflow2, if I put them in the workflow as it was working in the past it will get recognized as geometry optimization).

So what I’m I missing here? @ladinesa maybe?

ladinesa · September 30, 2023, 9:40am

The entry name among others are now read from the new workflow section (workflow2) that is why it does not show as an elastic calculation. I am attaching a revised
archive file (31.9 KB) adding the new workflow2 section, the old workflow section will be deprecated in the future so I would remove it.

Pavel_Ondracka · September 30, 2023, 9:47am

Perfect, thanks for a super fast answer, now it correctly detects elastic workflow.