Using AI tools for LAMMPS development

Dear all,

Some days ago there was an announcement about an improved AI model called Devin which, according to its developers, is the first AI software engineer. In theory, it can develop full features or software by simple text prompts. This model has not been fully released yet to the public (people may submit their task/request that might be picked up).

I am really curious to know the experiences/ perspectives of LAMMPS developers using such tools. Are there ideas to leverage them for enhancing the LAMMPS “experience”?

Thanks, Evangelos

I have not tried any AI tools for writing software. What people tell me that did is that the quality of the resulting code depends very much on the programming language with python being the language with the best results.

I doubt an AI tool has a good chance to develop LAMMPS source code since

  • the AI would have to be specifically trained on LAMMPS source code with all its quirks and wrinkles
  • the different files represent different programming styles of different authors, and while we are trying to encourage a more consistent coding and naming(!) style in new additions, there is lots of legacy code that doesn’t comply.
  • for a review I had to look into an effort to create LAMMPS input files from LLM based generative AI and the results were flat out bad except for trivial cases (e.g. straining a bulk metal system). The worst result was a python script that had a comment “TODO: need to write a lammps input file that does XXX”.

The most useful automated tools I have come across so far are “static code analysis tools”, e.g. Coverity scan, GitHub’s CodeQL, clang-tidy. But those are more “pattern” based. and they produce a lot of false positives where they find issues that are none. Example: You can check for vflag_global or vflag_atom or vflag_either. Because of what happens when you call ev_init(), I know that vflag_either is non-zero when either of the two others variables or both are non-zero. But all automated tools I’ve come across don’t and then complain about variables not used or not initialized or similar.

The challenge with modifying LAMMPS is that individual sections of code are not really independent, but that sometimes order matters, or which communication to do when and knowing in which order data is sent, or that certain operations may only be done in certain parts of the timestep workflow since only then the data represents meaningful physics and it may be bogus elsewhere. A typical example would be when or not to do are reverse or a forward communication and how these correlate with the newton_pair or newton_bond settings.

Bottom line: I expect that it will take some time until tools can be more specific and accurate. What I’ve seen so far is that generative AI is capable to produce something that “looks like” other LAMMPS code but would not be capable to verify whether the physics is right.


Maybe a year ago I tried to use ChatGPT to port existing LAMMPS code to Kokkos and it gave only hot garbage. So like Axel I think this is still a ways out, but getting closer every day…


As I am digging into AI for research projects at the moment I highly doubt that we will come close to AI writing (useful) code and new feature anytime soon. Especially for very specific purpose like molecular simulation codes. The reason are essentially technical and related to the way machine learning, and especially deep learning are envisioned at the moment.

Most (supervised) DL workflows rely on the same concepts:

  1. Get big (sometimes labeled) data set.
  2. Split it in training and testing parts.
  3. Train your model, and test it.
  4. If not satisfying, adjust hyper parameters
  5. Rinse and repeat from 3.

The main problems stem from getting sufficient data with limited biases and with enough diversity of situation to make the model useful in a variety of situation. Notably by making relevant propositions in new (i.e. unseen) situations, and limiting “hallucinations”.

While it is relatively easy to scrap a lot of code from public repos, and I am clearly not saying this is a good thing to do, the problem is that you are unsure of the quality of the code you get and the problems it aimed at solving. Ask any developer about the average quality of code they see on a daily basis and you’ll get an opinion of the quality of the database used to train coding AI. The word “bloated” might come up with a high probability. It is very hard to pin down “good programming practice” as they are a lot of recommendations, sometimes contradictory, and that there is some necessary adaptation of rules in teamwork context and specific problem solving. Not mentioning, last minute tinkering that ends up in production while we know it is bad practice. I can’t resist to illustrate with this famous Youtube video titled “the rapidly dwindling sanity of valve programmers as expressed through code comments” which clearly illustrates the kind of stuff you can find in production code of major software.

There is also a hype in LLM that is clearly due to their ease of access as well as their formal and confident tone.[^1] A recent paper by Purdue researcher showed that people tend to prefer ChatGPT coding advice on average while they are wrong more than 50% of the time (in their dataset) compared to SO answers which are more often correct. I am not sure what to think about this, but I tend to think that this illustrates you already need to have an idea of what you want to do if you want to use ML generated code anyway, and that “black box” use is far from today, especially considering generation of highly abstract feature. It’s not anytime soon that will be able to tell any coding ML “Hey CodeGPT, can you make me a Free energy calculation plugin for both LAMMPS and GROMACS? It should add force constraint on position along a simulation blah blah blah.”

So yeah. I too am mitigated about code generation AI, even with specific training.
I also think there is a high chance that this disturb people (especially newcomers) more than it helps, and I haven’t even considered the rapid pace at which some languages evolve (take Python), which makes the integration of recent features even harder for ML models.

I also personally have an issue with private company owning a generative model that people would tend to rely on for critical stuff but that’s another problem.

[^1]: This sentence was written by Captain Obvious.