Announcement: LammpsGPT now available!

Hey @akohlmey pin this !

2087606096_aec6deca0f_o

LammpsGPT what detailed custom knowledge do you have ? and what can you do with github api ?

I’m using a domain-specialized GPT model that has been extensively customized to work with materials simulation tools, especially LAMMPS, ReaxFF, Kokkos, and high-performance C++. It integrates thousands of pages of curated documentation, real source code, and scientific literature into a conversational assistant. Here’s an in-depth description of its capabilities and embedded knowledge.


:brain: Core Knowledge Domains

1. LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator)

Sources Integrated:

  • Full LAMMPS documentation (as of March 2025).
  • Parsed and structured C++ source code (src/ and lib/ directories).
  • Known issue trackers and resolved PRs from the official GitHub.
  • Benchmarks and examples from examples/, bench/, and tutorials.

Capabilities:

  • Explains any LAMMPS command with all arguments, defaults, and examples.
  • Diagnoses common input script errors and misconfigurations.
  • Deep access to internals:
    • Core classes: Atom, Pair, Force, Fix, Neighbor, KSpace, Update.
    • Integration schemes: verlet, respa, nve/nvt/npt, run_style respa.
    • Domain decomposition and parallelism strategy (spatial bins, ghost atoms).
  • LAMMPS-style memory management (memory->create(), destroy(), grow()).
  • Explains how various styles map to files, e.g., pair_style eampair_eam.cpp.
  • Compatibility notes: what features work with Kokkos, OpenMP, or GPU builds.
  • Assistance in debugging or extending LAMMPS with custom C++ code.

2. ReaxFF (Reactive Force Field)

Sources Integrated:

  • Dozens of ReaxFF publications, parameterization articles, and comparative reviews.
  • Parameter file specifications from ffield.reax, ffield.reax.params.
  • LAMMPS-specific implementation notes (REAXFF, fix qeq/reax, fix acks2).

Capabilities:

  • Parses and explains every block in a ReaxFF parameter file.
  • Connects parameters to physical models (e.g., bond order, valence angle terms, torsion corrections).
  • Validates ReaxFF parameters against expected chemical behavior.
  • Literature insight into:
    • Charge equilibration: QEq vs ACKS2 [doi:10.1021/ct300073q].
    • Handling long-range Coulomb via fix qeq/reax, kspace_style, or ACKS2 extension.
    • Overcoordination and undercoordination handling.
  • Practical modeling insights:
    • Guidance on modeling combustion, catalysis, aqueous reactions.
    • How to adjust parameters for ion migration, SEI growth, shock loading.
  • Assistance with fitting workflows, e.g., using reaxff-lg, Genetic Algorithm fitting, or training set generation.

3. Kokkos (Performance Portability for GPUs/CPUs)

Sources Integrated:

  • Full snapshot of the Kokkos-Core-Wiki.
  • Kokkos source code and LAMMPS integration layer.
  • Examples from miniKokkos, LULESH, Cabana, and real simulation codes.

Capabilities:

  • Detailed understanding of:
    • Kokkos::View, Kokkos::parallel_for, parallel_reduce, TeamPolicy, RangePolicy.
    • Memory spaces: CudaSpace, SharedSpace, HostSpace, ScratchSpace.
    • Execution spaces: Cuda, OpenMP, Threads, Serial.
  • LAMMPS-specific mappings:
    • PairKokkos and FixKokkos wrappers.
    • GPU layout implications for atom properties.
    • Tuning advice: layout_right vs layout_left, shared memory tuning, scratch size.
  • How to convert raw C++ loops to Kokkos kernels.
  • Assistance debugging race conditions, memory access violations on device code.
  • Profiling and tuning with nvprof, Nsight Systems, VTune, or TAU.

4. Modern C++ (C++11 through C++23)

Sources Integrated:

  • Curated JSONL knowledge of cppreference.com (2025 snapshot).
  • Specific analysis of idioms used in LAMMPS and Kokkos.
  • Idioms used in large-scale scientific software (template metaprogramming, RAII, traits, etc).

Capabilities:

  • Explains any STL container, algorithm, or concurrency primitive.
  • Deep dive into:
    • std::vector, std::map, std::unordered_map as used in atom management.
    • std::thread, std::atomic, std::mutex, and LAMMPS concurrency layers.
    • Template design patterns: CRTP, SFINAE, type traits, enable_if.
  • Memory models: new/delete, custom allocators, alignment issues on GPU.
  • Help writing performance-portable device/host templated C++ code.

5. Build Systems & Performance Engineering

Includes Knowledge Of:

  • CMake (LAMMPS_DIR/cmake/), Make, Ninja.
  • Compilers: GCC, Clang, NVHPC, Intel OneAPI.
  • Accelerators: CUDA, HIP, SYCL, OpenMP 5.
  • MPI, ScaLAPACK, FFTW, libint, libtorch.

Capabilities:

  • Walkthrough of configuring cmake for LAMMPS:
    • Kokkos GPU builds: -D PKG_KOKKOS=on -D Kokkos_ARCH_VOLTA70=on -D Kokkos_ENABLE_CUDA=on.
    • Cross-compiling on Frontier, Fugaku, Polaris, etc.
    • Static vs shared builds, multi-package linking issues.
  • Guidance for:
    • Optimizing MPI domain sizes, OpenMP thread pinning.
    • Memory affinity, NUMA effects.
    • Profiling with perf, nsys, VTune, HPCToolkit.

:link: GitHub API Integration

Can Perform:

  • Search repositories, commits, PRs, issues.
  • Read files at any branch or SHA.
  • List and summarize PRs, including diffs and reviews.
  • Extract CMakeLists.txt, ffield.reax, source code, and diffs across versions.
  • Help maintain forks and parameter sets by syncing or automating edits.

:pushpin: Use Case Examples

  • Parse and validate a custom ReaxFF parameter set line by line.
  • Explain why a specific LAMMPS error appears during a fix or run command.
  • Help refactor a pair_style for Kokkos execution.
  • Profile and optimize a LAMMPS build for multi-GPU performance.
  • Review a GitHub PR that modifies compute_heat_flux.cpp and explain the diff.

This assistant is built for researchers working on molecular simulations, materials modeling, and high-performance scientific computing. It supports both beginners and advanced users working in LAMMPS, ReaxFF, and GPU-accelerated platforms like Kokkos.

Feel free to tag me or ask about specific LAMMPS input files, parameter tuning, or code extensions.

I am sorry @alphataubio, but I’ve rarely been as uncomfortable and disgusted as when I understood this was not a joke. I understand the good intention, I think this is a very bad idea.

There is already a lack of proper teaching for some people in the molecular simulation community which reflects on this forum. Now I don’t know what to think about bringing what computer sciences have done worst for general education in our community.

For example, when asked how to “optimize the processors balance in order to optimize the calculation, and combine it with OPENMP for 28 processors?” with a concentration gradient along the z axis (a question formulated in a vague and generic way as a user would do), it answered me 3 solutions, all of them wrong in different ways.

First it told me to use the partition switch with the following command:
mpirun -np 28 lmp -sf omp -pk omp 4 -partition 7x2x2 -in input.lammps

and the following reasoning:

Here:
7x2x2 → 7 MPI tasks along z, 2 along x, and 2 along y, adjusting for the concentration gradient.
-pk omp 4 → Each MPI rank uses 4 OpenMP threads (since 28/7=4).

As far as I can tell this is an incorrect explanation of the partition flag, an incorrect command line, and a bad combination of processors use.

In another solution it proposed the following command:

fix balance all balance 1000 z 1.1
1000 → Rebalance every 1000 steps.
z 1.1 → Allow a 10% variation in processor domain sizes along z.
For fine-tuned balancing:
fix balance all balance 1000 z 1.1 weight atoms
This weighs partitions based on number of atoms rather than just volume.

Which is also incorrect both in its syntax and its explanation of the values meaning. The last sentence makes no sense.

As a last proposition it told me:

For best performance, use Kokkos instead of the legacy OpenMP package:
mpirun -np 28 lmp -sf kk -pk kokkos omp 4 -in input.lammps

I didn’t know OpenMP was in a legacy state, I still use it on a daily bases and hope it will stay in the distribution. The command format is also very odd in its package call.

At no point does it mention that the geometry of the system is important or how to optimize the load balance between MPI domain decomposition and OPENMP threads, or even how it works (not to mention KOKKOS). To be fair it told me that I could also use processors * * 1 to reduce communication overhead caused by splitting along the z axis which is the only correct point it got.

So all and all, I would be even more skeptical than initially to my own usage, and I think that this can have a devastating effect on people trying to learn MD and understand how LAMMPS works. If it can’t even get me the answer to a generic question I know the answer of, then it is worst than useless. It is detrimental.

As I teacher we already struggle a lot with students using ChatGPT during classes and we see, first hand, the effects on their learning of the content of our classes. And it is not very bright.[1]

As such I will follow my general heuristics with GPT based conversational agents which is to stay away as far as I can from them and tell people to do the same. They are stochastic parrots and the benefits or use cases that can be expected from them are still very limited compared to the training data required to make them a minimum useful and the quality of their actual output.


  1. On this subject the scientific literature in the field of teaching is only slowly starting to catch on, and while some results seems to show some positive effects, those are to consider with care since the explosion of conversational agent is very recent, and since the statistical power of most studies is very weak. There are also methodological questions that needs to be answered, such as which studies consider “learning” as "something you can do without access to a conversational agent. ↩︎

2 Likes

If people want to use that kind of service, it is their choice, but please don’t cut-n-paste, responses from that bot to questions here. Most of the people responding here regularly think it is a disservice to the original posters. Thanks in advance for your understanding.

1 Like

I think a LLM trained specifically on LAMMPS data could be very useful in some cases. And I think it is totally appropriate to advertise a new tool, and even point users to the tool’s URL. Individuals can decide themselves if it is useful for them.

Personally I find ChatGPT almost useful, it is so close but not quite there yet due to its inaccuracies. However, ChatGPT is getting better with every model update, I think we are just scratching the surface of AI.

Like any tool, the real value depends on how it is used, and on who is using it. It could be a crutch to a student, but it could also give just a little push that some scientist needs to make the next big discovery…

Even then I would strongly disagree with LAMMPS-GPT’s advice. Most beginners simulating a system with a concentration gradient along z would do so with a very anisotropic box ratio – my own nano electrolyte simulations routinely have boxes 70-100 angstroms long and 5-7 angstroms wide.

Using processors * * 1 would not work well in most such situations. Which highlights a conceptual weakness in LAMMPS-GPT – most of the problems people have using LAMMPS comes from outside the code.

Another interesting problem is LAMMPS-GPT being trained, it seems, on LAMMPS’s source code. So I asked it to write me a line for conditionally splitting the box into 2 z-partitions on an even number of procs, but not on an odd number (useful for debugging a system on a workstation before sending it off to HPC). The response:

variable nprocs equal ${nprocs}  # Number of processors
variable halfprocs equal v_nprocs/2

if "(v_nprocs % 2) == 0" then "processors * * ${halfprocs}" else "processors * * *"

That threw me off because I’d never seen nprocs before and yet it seemed so feasible (indeed, maybe it should be a thermo variable). But no – LAMMPS-GPT clearly “read” about nprocs from the many places in the source code where it does, indeed, count procs, and then mistakenly extrapolated to nprocs being a good script variable.

(For the curious, the correct version is extract_setting(world_size)).

1 Like

My mental model is that they are Large Language Models. They are good tools to help us with language difficulties.

For example, I would welcome autonomous moderation agents that flag posts – for example for extremely vague requests, out-of-context posts, source code that isn’t backtick-fenced, and so on. These forums are spaces where we primarily use language to solve problems together and anything that helps us better use language to do so could help.

On the other hand, while a difficulty with using the LAMMPS forum could largely be communicative, a difficulty with using LAMMPS itself is almost always conceptual (and I started out facing many such difficulties myself!). Until we have evidence that these large language models are also dependable and deterministic large concept models – and, in particular, can model the continuum between not fully grasping a concept to overthinking it perfectionism ahem, expertise, as teachers often have to – I don’t think they will be quite so useful.

1 Like

I totally understand the feeling and can agree that ChatGPT becomes more impressive with every releases. But I think part of my rejection also comes from my understanding of the design of such models.

OpenAI’s GPT models are pre-trained with very large[1] database using statistical methods to be statistical predictors. As such, I think they can’t be specialized and performant at the same time. I’ve seen people saying that when their companies tried to train specialized models over 20 years of digitalized working documents, this wasn’t enough to get something remotely useful. So, in my opinion, this puts the bar way too high for any specialized (or professional) application.

Besides, these statistical predictors have absolutely no abstract comprehension of what they are saying. GPT-o4 can absolutely tell that a series of number is a look-and-say sequence and enumerate the rule of the sequence, yet, from my experience, it cannot get the correct number past the 5th or 6th one, and moreover, when detailing its reasoning, it introduces even more errors. Yet this is a task a 5 years old could understand, and this sequence is heavily documented in the mathematics literature. What can we think about tasks requiring a higher order of abstraction? This is the kind of stuff the “symbolic” school of IA tried to tackle with very mitigated success before the “back-propagation” breakthrough and the use of GPUs for training. This is also why I think AI cannot be helpful with the kind of problems @srtee puts front in the discussion.

Anyway, these are the reasons I perceive current state-of-the-art AI to be very inelegant in its design and way too expensive to be something I can rally behind.


  1. In addition to the ludicrous costs of the training of GPT-5, it appears that the whole internet is hardly enough data for this next generation. People are now generating fake data from previous models. This baffles me. ↩︎

I find this understandable, but put in another perspective it is estimated that it cost up to $1 million (in today’s dollars) to create the first light bulb (How Much Money Did It Take to Invent the Light Bulb?). I consider AI to be in its infancy, there are surely inefficiencies that will be solved by future innovations and economies of scale–just how the light bulb is now nearly a million times cheaper than the very first one.

Adding the accumulated wisdom and expertise from years of matsci Lammps posts would greatly improve the model but that would require me to get a discourse api key from the admin @akohlmey which is unlikely.

I found these custom GPTs to be somewhat useful, certainly more specific than something trained from “the internet”.

I was recently in a similar state of mind like others but I didn’t want to be an old guy that refuses to adopt new technology. So far my recent experience with coding and debugging has been that stuff that use to take me a week to get done is now accelerated to 1-2 days so my increase in productivity has been spectacular.

The main issue is context window so you can only work on a chunk of code up to approx 1000 lines at a time, anything more and the model starts dropping bits all over the place.

Overall it’s like talking to Einstein reincarnated in a goldfish with Alzheimer’s of what happened 5 minutes ago but still “almost useful”.

@Germain if you say a wrong answer was given then the model is only parroting a weakness in the Lammps documentation. In that case you should contribute a PR in lamps/doc/src that improves the documentation as needed. Or suggest other materials to include in the custom gpt (an open access MD textbook for example)

1 Like

I have no problem with that. You would have to contact the MatSci admins for how to get an archive of all the LAMMPS posts. I have no idea how to do that. I know that Discourse has an export and an import feature. For the time being, I can also provide you with a complete archive of the lammps-users mailing list.

I do, however, firmly believe that this is a waste of time. The signal-to-noise ratio is just too bad, so that I believe that the data without being properly curated is leading to worse responses.
If you would actually create a properly curated “questions and answers”, we would gladly post it on the LAMMPS website. I have suggested it many times, but no takers so far. I have done the same myself when I was a beginner in a different software and it has helped me a lot and the section is with only minor changes still in that software’s manual (after I stopped working for that software over 15 years ago).

Keep in mind that many posts in the mailing-list and forum provide “bad” inputs and - for some reasons unknown to me - those seem to be copied more often than “good” inputs. Also, responses often just are suggestions or a few lines but not a complete input. Specifically in the times of the mailing list, many discussions went off-list, so there is no feedback. This was one of the motivations to move to a forum and retire the mailing list. But even on the forum, only a subset of people confirm if suggestions were useful.

Also, if you feed too much data from the mailing list archive, you may get too munch “angry-mode Axel”. My current position in the LAMMPS community requires me to represent the entire community. Back then I was just representing myself and thus would have much less patience with incompetent or ignorant people and would tell people how I think. Now, I just resort to ignoring people with behavior that upsets me.

My experience is different. For the most part the suggested code changes (on C++ code) would not solve problems but remove the feature causing them, which is not at all desired. The suggestions on LAMMPS input would only be correct for trivial things, but often way incorrect for more complex issues. Not something I would recomment especially for beginners.

I have seen in person the disastrous effect that AI assistants and bots have on training beginners (in parallel programming and in using LAMMPS) and cannot recommend this at all. The main problem is how deceptively confident their output is.

It is far more effort to convince somebody of their mistakes that have consulted a chatbot and taken that advice as the gospel. In some (research) cultures this has progressed to a level that has stopped me from wanting to teach at workshops and tutorials anymore.