Discussion: Workflow and data provenance for atomistic simulations

I’m trying to get a sense for which framework is ideal to use for workflow management and data provenance for calculations/simulations performed on my laptop. This is something I want to get more serious about with my computations. I like atomate but not sure how to use it outside a HPC environment, or more generally seems unclear how to setup on a local cluster without a sys. admin or how to do it on my own device. The materials project workshop tutorials were nice, but couldn’t understand how to work through them. Also, would like to know if atomate supports other DFT codes, such as Abinit. The AiiDA package seems easier to get setup on a user device, although still confusing, but the environment looks unclear, also what simulation codes are supported. Then I came across SEAMM which, in my opinion, looks to be the most straightforward to use, but I’m not sure how powerful it is compared to atomate or AiiDA. It would be nice if someone has a table comparing the features across them. In the past I’ve used ASE for data management of calculations and enjoyed it, but it lacks provenance and workflow capability; at least to my knowledge.

I don’t know of any comparison table, or even central list, of workflow software designed for / suitable for materials science projects. Every project listed on Scientific workflow system - Wikipedia is geared for bioinformatics (except VisTrails, which seems to simply be dead). On the bright side, this means you’re doing pioneering work!

I took a look at https://signac.io/ a while ago and always wanted to learn it – maybe this is a good opportunity for you to give it a try! Otherwise, my (not very helpful) advice would simply be to have a clear list of Things I Need To Achieve, and quit early on a package which isn’t helping you with those.

Thanks @srtee , will take a look at signac.io. I also came across a recent scientific reports paper on the MISPR package that seems to be similar to atomate but includes, nicely, forcefield parameterization and MD simulations in the workflow. Unfortunately, at the moment the documentation is sparse and probably still in development.

I just started a topic on FAIR principles in molecular dynamics, which overlaps with what you’re discussing here!

Recently took a look at pyiron which seems like a other option for workflow management. In particular seems nice for interatomic potential fitting workflow. It also has a pretty straightfoward API which is nice. Will need to spend a little more time with it. I also started to read more about snakemake which is also looking like a potential path.

I jump into this discussion by linking this other discussion on the #lammps forum.
Also, here an interesting open-access article from the people of Battery 2030 reviewing different approaches to workflow management of scientific simulations: Workflow Engineering in Materials Design within the BATTERY 2030+ Project

I am learning AiiDA , as it has a plugin system to integrate different software.


1 Like