Uploading custom results using directly the *.archive.json files

I have a question, it was mentioned that it is possible to upload manually the nomad metainfo, so that one doesn’t need a parser to upload some results. How does it work specifically, and are there some examples?

Just to have a specific simple case for illustration, consider that one does an EOS fit manually or with ase, i.e., there is a bunch of DFT (with some DFT code that already has a parser) calculations with different volumes and outfiles “outfile-volume-xx” and now I would like to create a *.archive.json file for the EOS workflow manually. How should it look like?

If I upload something like this, the archive parser is invoked but I end up with an empty entry, the workflow data is not detected (I would expect that the entry shows that this is EOS workflow and the EOS figure):

{
  "archive": {
    "run": [
      {
        "program": {
          "name": "Custom EOS fitting program",
          "version": "0.0.1"
        }
      }
    ],
    "workflow": [
      {
        "type": "equation_of_state",
        "equation_of_state": {
          "volumes": [
            2.590176255044027e-29,
            2.857923598843761e-29,
            2.9511096938602546e-29,
            2.7667204026517606e-29,
            3.046298974841937e-29,
            3.2427757312672296e-29,
            3.143514675276939e-29,
            3.344106299956417e-29,
            3.770528810636432e-29,
            3.882517693534008e-29,
            3.660714751228673e-29,
            2.677478885768968e-29,
            4.1131037855971765e-29,
            4.3526435740256103e-29,
            4.475824695172308e-29,
            3.447523997289499e-29,
            4.2317440927593925e-29,
            4.6013081082379197e-29,
            3.5530543183707506e-29,
            4.7291148140639506e-29,
            3.996702460217774e-29
          ],
          "energies": [
            -2.454369723290595e-19,
            -2.731979229153517e-19,
            -2.7949855983357933e-19,
            -2.6551520196951336e-19,
            -2.845467580536103e-19,
            -2.9133321938766833e-19,
            -2.8845392853258726e-19,
            -2.9327794699363573e-19,
            -2.9339910399124295e-19,
            -2.918733347603743e-19,
            -2.943678865289055e-19,
            -2.5631193488863316e-19,
            -2.873697548304791e-19,
            -2.81253344192512e-19,
            -2.777022122089516e-19,
            -2.9437851937413706e-19,
            -2.844879185172708e-19,
            -2.738697235888711e-19,
            -2.947165317802445e-19,
            -2.697911077931872e-19,
            -2.898459384449899e-19
          ],
          "eos_fit": [
            {
              "function_name": "mie_gruneisen",
              "fitted_energies": [
                -2.4538980481488733e-19,
                -2.732328212717778e-19,
                -2.79522031112045e-19,
                -2.655501158550913e-19,
                -2.845524431297313e-19,
                -2.9131167432361963e-19,
                -2.884457506442127e-19,
                -2.9324954409225927e-19,
                -2.9339281267625726e-19,
                -2.9187880848840325e-19,
                -2.94350554656235e-19,
                -2.5632447670213415e-19,
                -2.8739657380753024e-19,
                -2.812853006161564e-19,
                -2.7771947779445384e-19,
                -2.943490663664639e-19,
                -2.845234647497659e-19,
                -2.738600624494e-19,
                -2.946915428658402e-19,
                -2.6973805258604493e-19,
                -2.8986344187533134e-19
              ],
              "bulk_modulus": 192543470835.1147,
              "bulk_modulus_derivative": 6.244142129575867,
              "equilibrium_volume": 3.5513500180643385e-29,
              "equilibrium_energy": -2.9469163027943346e-19
            }
          ]
        }
      }
    ]
  }
}

You are right, there is an archive parser and you can use it. Your files have to be named *.archive.json or *.archive.yaml.

Keep in mind that the archive (e.g. ArchiveEntry) instance is the top-level object. In your examples, you must not have a key archive and put the data directly into the file. Like this:

{
  "run": [ { ... } ],
  "workflow": [ ... ]
}

You can also test your json with the nomad cmd: nomad parser yourfile.archive.json --show-archive.

A few notes about the archive parser:

  • The uploaded files can be json or yaml
  • Technically the uploaded file is still a mainfile and there is still an archive as the result of “parsing”. What you see on the data tab is the archive not your provided file, even though both would be mostly the same.
  • Stuff that does not match the schema will be ignored. In you example, this happened to the archive key and hence all your content. Leaving you with an “empty” entry.
  • DFT specific normalisers are not applied. Here it could be discussed, if this should be changed.
  • If you add a key metadata it will be ignored even though there is a metadata sub section in the schema. This special section is supposed to be determined by the system not by uploaders.

Thanks for the clarification, without the archive toplevel entry it works like a charm. Just few more questions:

  • what happens when the metainfo scheme changes in the future and a mass reprocessing happens. Will the entries potentially get blank (and non-searchable) again?
  • In what part of the metainfo am I supposed to link the underlying DFT calculations (i.e., so that someone looking at the EOS can easily find the individual DFT calculations corresponding to the points)?

The answers to both questions are still under lots of discussions on our side.

  • We think of the archive as two parts. One is schema flexible and might undergo frequent changes. This is where domain specific data goes. From the theory side it is mostly the sections run. We are establishing a schema versioning mechanism that allows for different versions to coexist at the same side. I.e. none parsed data is not subjected to re-processing and we will keep the old version. The other part of the archive will be more fixed and not undergo frequent changes. This is the part with sections results, workflow, metadata, etc. If we need changes here, data migration for non parsed data will be necessary.

  • Typically that would be the section workflow. But, we are still establishing the right structure here so I can’t really give a good answer yet.