Best practice to put the output data(e.g. total energy) of VASP in the fw_spec

Hello everybody!

If I use e.g. StaticFW or OptimizeFW in a larger workflow, the results of the calculations will be put into the database. If I need some output later on in another Firework, I want to put it in the fw_spec, which I can then pass on to the next Firework.

My question is if I should use database queries if I wish to later use some output value like the total energy in another Firework? If I would like to do that, how would I properly identify the correct calculation? Using code similar to the tutorial about the MgO bandstructure I think that I might get problems if I have several structures for the same compound. Should I pass the firework ID with a powerup? Can you recommend some documentation about MongoDB queries, especially for the VaspCalcDb case?

Intuitively I would rather write a new Firework that copies the output files using PassCalcLocs and then just parsed the OUTCAR using Outcar from from pymatgen.io.vasp.outputs. However, using the database seems very powerful, and I want to use best practice from the beginning if possible.

Thanks a lot, Michael

Hi Michael,

Thanks for the question - this is somewhat of an unresolved design question for atomate workflows. As you point out, the general strategies are:

  1. Passing the information directly between the Fireworks in the workflow, using the update_spec or mod_spec in the FWAction.
  2. Having a label for each of the Fireworks in the workflow (so you can identify all the Fireworks in a given workflow). Then, use the database to retrieve the relevant information from the Firework with the desired label.
  3. Something else - like pass the location of the current run to the next Firework, and have the next Firework try to re-parse the output from the previous Firework to get what it needs.

Typically we have gone with either option 1 or 2, but I think going forward option 2 is going to be better. This is because we want to make the Fireworks within a workflow (e.g., StaticFW or OptimizeFW) interchangeable between different workflows. Meaning, StaticFW won’t know if it’s embedded within an optical calculation workflow, an elastic tensor workflow, etc. In that case, it won’t know what information it is supposed to pass to the next step. However, going with option 2 will mean increased reliance on database connections being available.

Thus, if some downstream Firework requires the total energy from a previous Firework, I’d say the recommended option is to:

  • add some kind of unique label to your upstream firework that will be present in its FW spec and will also show up in the atomate output database (tasks collection).
  • Have your downstream Firework query the database for the information it needs, using the label as a way to restrict the query to the particular calculation you care about.

But as there is no real recommendation right now, use whatever strategy you think is best and makes your life easy.

There should be an example of the database strategy in atomate.vasp.workflows.base.bulk_modulus.get_wf_bulk_modulus (see the “tag” parameter)

although there might be cleaner ways to implement it.

Dear Anubhav,
thanks for the quick and comprehensive answer.

The idea to combine a human readable string with str(uuid4()) as shown in atomate.vasp.workflows.base.bulk_modulus.get_wf_bulk_modulus is very good I think.

I still have to learn all the intricate things in querying MongoDB, but this is something I need to do anyhow. I guess there is no tutorial written yet how to do that for stuff that is saved in the database using VaspToDb()? But using general MongoDB language documentation and checking the docstrings of the relevant atomate methods should work anyhow.

Thanks again, Michael