Scope for making LAMMPS Automatically Differentiable

patherlkd · February 28, 2023, 5:08pm

Dear LAMMPS developers,

With the growing use of machine learning and optimization techniques (that are powered by automatic differentiation Automatic differentiation - Wikipedia), and their adoption and use in the field of molecular modelling, I thought it could be worth discussing whether making LAMMPS - a very efficient, flexible, and well-tested code with an active community - automatically differentiable is worthwhile.

The research motivation includes performing precise and scalable optimal control, exploring design spaces, and generally: the numerical optimization of well-posed objective functions that require derivatives of whole simulations.

JAX-MD does this: https://arxiv.org/pdf/1912.04232.pdf, but it is (considering CPU only) ~10x slower than LAMMPS and does not exhibit the same power in terms of what one can simulate.

Current ways of turning a C++ code into code that is differentiable involve operator overloading or source code transformation (e.g. with Adept: Adept: A combined automatic differentiation and array library in C++ or with autodiff GitHub - autodiff/autodiff: automatic differentiation made easier for C++).

Possible pros:

Considerable expansion to what LAMMPS can do in terms of machine learning and optimization (and other relevant research questions).
Competitive edge over existing automatically differentiable MD packages due to speed and huge existing codebase.

Possible cons/difficulties:

Due to the size of the codebase, this might be a huge task (or one that requires alot of dedication). Operator overloading strategy might be best in this case.
Extra difficulty in navigating parallel codebase. Ensuring loss in efficiency is minimal.
Making sure all of the existing codebase does not break (significant testing).

Best,

Dr. Luke Davis

Department of Mathematics
University College London

akohlmey · February 28, 2023, 5:25pm

Luke,

I think what you are proposing is a very niche application for what the kind of problems that LAMMPS is designed to solve. Thus a complete transformation of the entire code base is an absolute no-no. The original design of LAMMPS was a “C with classes” approach since polymorphism in C++ solved growing maintenance problem with the Fortran 90 version (had Fortran supported it at the time, LAMMPS might still be written in Fortran now). Another design goal was to make it easy to modify and extend LAMMPS. Thus we have been very careful with introducing modern C++ elements and mostly do it where there are significant benefits to code readability and maintainability over the equivalent construct written in C or using C library functions.

That said, we recently incorporated the Lepton library from OpenMM and adapted it to implement pair, bond, angle, and dihedral styles and a change for implementing wall interactions is on the way. These allow to specify interactions as expression strings that are translated into executable code at run time (using a JIT compiler when using a supported platform). In this case the symbolic differentiation and optimization is implemented by the Lepton library.

That said, I also don’t quite understand what you mean by derivatives of whole simulations. Simulations are typically characterized by statistical analysis of trajectory data accumulated over sufficiently large periods of simulation times. Or you are looking to approximate free energies for predefined processes? Or you want to derive material properties under specific circumstances or in relation to the composition of materials. How can you take a derivative of that?

Bottom line, if you want to do something like you describe, it should be a project of its own. That said, LAMMPS is distributed under the GPL v2 license and you are free to use as many components of it as you like provided you put your project under a compatible license.

patherlkd · February 28, 2023, 5:33pm

Fair enough.

Derivatives of whole simulations: taking gradients of a cost function that depends on the dynamical trajectory of the constituent particles where computing the analytical derivative is nigh impossible. But since the simulation is ultimately just computer code automatic differentiation will be able to find the exact derivative (given the chain rule can be propagated correctly through correct operator overloading).

I’ll work on something myself ;).

Best,

L