Proper usage of "name" field

jshen · November 23, 2022, 5:57pm

I think we need to make some decisions about how to use the task_label only uses the name field from the job.

materialsproject/atomate2/blob/73b9b3e71be8addc237c9d6cfce3dd1c09aa2ca8/src/atomate2/vasp/jobs/base.py#L151


      
          
          
# write any additional data
          for filename, data in self.write_additional_data.items():
              dumpfn(data, filename.replace(":", "."))
          
          
# run vasp
          run_vasp(**self.run_vasp_kwargs)
          
          
# parse vasp outputs
          task_doc = TaskDocument.from_directory(Path.cwd(), **self.task_document_kwargs)
          task_doc.task_label = self.name
          
          
# decide whether child jobs should proceed
          stop_children = should_stop_children(task_doc, **self.stop_children_kwargs)
          
          
# gzip folder
          gzip_dir(".")
          
          
return Response(
              stop_children=stop_children,
              stored_data={"custodian": task_doc.custodian},

I think the name of a job is not a super well-protected field since many workflows use it for bookkeeping.
We should make this a bit more formal and make sure that the task label is only going to be things that is interoperable with atomate1 tasks databases.

This is related to this PR on the foundation Create 0002-scope-of-emmet-models.md by rkingsbury · Pull Request #2 · materialsproject/foundation · GitHub
So not sure where to discuss this.

mkhorton · November 23, 2022, 8:07pm

Were we discussing this in a PR somewhere too?

I believe task_label was also not particularly well-defined in atomate v1. I’ve always been of the opinion that relying on task label as reliable metadata was dangerous, since it’s typically not encoded in the launch directory (correct?) so can get lost on re-parse.

jshen · November 23, 2022, 10:13pm

I wanted to put this here first since this is code specific issue but we can bring it to the other PR once higher-level stuff has been decided.

Does anyone know if the validation builder can be used for this kind of purpose? @rkingsbury @munrojm

rkingsbury · November 28, 2022, 6:00am

I don’t know about the validation builder, but I can say that 1) I have frequently used the task_label to add user-defined metadata to a job name and 2) it is not very robust!

I think it is important to have a human-interpretable name for task documents, but enforcing a rigid schema would be valuable

jshen · November 28, 2022, 7:58pm

So I think atomate 2 should have a task_label populated by the workflow in some way that is more protected than the name field which currently gets modified a bit by the more complex workflows:

github.com

materialsproject/atomate2/blob/ff5f0b0ba3e80138515600ba0950756d8486c84d/src/atomate2/vasp/flows/elph.py#L165


      
          elph = elph_maker.make(
              static.output.structure, prev_vasp_dir=static.output.dir_name
          )
          
          
# use static as prev_dir so we don't inherit elph settings; using a prev
          # directory is useful as we can turn off magnetism if necessary which gives a
          # reasonable speedup
          supercell_dos = self.uniform_maker.make(
              elph.output.structure, prev_vasp_dir=static.output.dir_name
          )
          supercell_dos.append_name(" bulk supercell")
          
          
displaced_doses = run_elph_displacements(
              elph.output.calcs_reversed[0].output.elph_displaced_structures.temperatures,
              elph.output.calcs_reversed[0].output.elph_displaced_structures.structures,
              self.uniform_maker,
              prev_vasp_dir=static.output.dir_name,
              original_structure=static.output.structure,
              supercell_structure=elph.output.structure,
          )

Maybe we can use this to tag things:

github.com

materialsproject/atomate2/blob/73b9b3e71be8addc237c9d6cfce3dd1c09aa2ca8/src/atomate2/vasp/schemas/calc_types/utils.py#L122


      
          
          
    elif incar.get("IBRION", 1) == 0:
                  acalc_type.append("MD")
          
          
    if len(acalc_type) == 0:
                  return TaskType("Unrecognized")
          
          
    return TaskType(" ".join(acalc_type))
          
          

          
def calc_type(
              inputs: Dict[Literal["incar", "poscar", "kpoints", "potcar"], Dict],
              vasp_parameters: Dict,
          ) -> CalcType:
              """
              Determine the calc type.
          
          
    Parameters
              ----------
              inputs
                  Inputs dict with an incar, kpoints, potcar, and poscar dictionaries.