Fix spring/self and Kokkos acceleration

LorenzoPiersante · September 18, 2025, 1:11pm

Hello,

Lately I have been working with the fix ti/spring with a version of LAMMPS I compiled with KOKKOS acceleration for GPU.

The adiabatic switching from the side of the real Hamiltonian is not problematic. I obtained results comparable to data available in the literature. My big problem has been the switching from the Einstein crystal side. In brief, I always had huge oscillations of the temperature and couldn’t quite equilibrate the system.

In order to identify better equilibration strategies for the Einstein crystal, I carried out several tests with the fix spring/self together with a Langevin thermostat (the one recommended for ti/spring).

Here is the input script:

# commands to set up KOKKOS
newton off
package kokkos newton off neigh full

units metal
atom_style atomic
boundary p p p

read_data hcp_rescaled.dat

# constant volume simulation
fix normal_dynamics all nve

# fix the spring force
fix einstein_spring all spring/self 1.36697

# compute the temperature without the CM contribution
compute temp_no_cm all temp/com

# custom thermo output
thermo_style custom step temp vol pe ke etotal f_einstein_spring

thermo 100

# instantiate random velocities
velocity all create 100.0 4928459 mom yes rot yes

# ramp-up run
fix equilibrate_temp all langevin 100.0 473.0 $(50.0 * dt) 86 zero yes
fix_modify equilibrate_temp temp temp_no_cm
run 100000

At this point I tried out two simulations:

one with KOKKOS on a gpu: lmp -k on g 1 -sf kk -in file.lammps
one with CPU lammps with MPI: mpirun -np 12 lmp -in test.lammps

The behaviour of the two simulations is dramatically different. I attach the respective log files and a plot of the temperature as a function of steps.
temp_kokkos_gpu.pdf (21.8 KB)
temp_cpu.pdf (22.5 KB)
log_kokkos_gpu.lammps (102.5 KB)
log_cpu.lammps (102.3 KB)

The only difference between the two input files are the two first lines. They are deleted when I run on CPU.

I read the documentation regarding the package KOKKOS and the various optional arguments of package, but I couldn’t figure out the issue.

I considered the option of round off errors due to the GPU (RTX 3070), so I tried to run on an H100 GPU of the compute cluster. The result was essentially the same.

If anyone has any ideas, please let me know.

Thanks in advance,
Lorenzo

akohlmey · September 18, 2025, 1:31pm

Before reporting issues like this one, you should always first check if the same issue still exists with the latest LAMMPS version, or at least the latest update to the stable version.
Your version 19 Nov 2024 is a bit outdated and it so happens that there have been changes to the KOKKOS version of fix spring/self after that date.

git show 50df32f6fe3f85406962d2bfe1639b3e5f010195 KOKKOS/fix_spring_self_kokkos.cpp
  commit 50df32f6fe3f85406962d2bfe1639b3e5f010195
  Author: Stan Moore <[email protected]>
  Date:   Tue Dec 3 16:20:34 2024 -0800
  
      Fix issues in KOKKOS package
  
  diff --git a/src/KOKKOS/fix_spring_self_kokkos.cpp b/src/KOKKOS/fix_spring_self_kokkos.cpp
  index 1b6d45ead7..59b9a49ee8 100644
  --- a/src/KOKKOS/fix_spring_self_kokkos.cpp
  +++ b/src/KOKKOS/fix_spring_self_kokkos.cpp
  @@ -123,7 +123,7 @@ void FixSpringSelfKokkos<DeviceType>::post_force(int /*vflag*/)
     auto l_yflag = yflag;
     auto l_zflag = zflag;
   
  -  Kokkos::parallel_reduce(nlocal, LAMMPS_LAMBDA(const int& i, double& espring_kk) {
  +  Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType>(0,nlocal), LAMMPS_LAMBDA(const int& i, double& espring_kk) {
       if (l_mask[i] & l_groupbit) {
         Few<double,3> x_i;
         x_i[0] = l_x(i,0);

LorenzoPiersante · September 18, 2025, 1:36pm

Thanks for pointing that out. This is the version our cluster administrators recommend. I will try to compile the latest stable version and provide updates.

akohlmey · September 18, 2025, 1:52pm

This is an odd choice. We specifically release stable versions and provide bugfix-only updates to them for the purpose of packaging and system-wide installation. While we try to keep LAMMPS working correctly at all times, the chances of having bugs due to recently added features are higher in feature release versions compared to stable versions.

If there had been any need to install a feature release version, it would have been obsoleted by the following stable release, which happened in this case in July 2025.

LorenzoPiersante · September 18, 2025, 2:09pm

Hello,

I git cloned the release branch of lammps, compiled it with KOOKOS acceleration, and re-run the test (same input script). Here I provide the results.
temp_stable_gpu_kokkos.pdf (21.8 KB)
log_stable_gpu_kokkos.lammps (102.6 KB)
The behaviour is basically the same.

Any input on the matter is welcome.

(I will check the stable branch as well)

Thanks in advance,
Lorenzo

akohlmey · September 18, 2025, 2:31pm

No need.

Thanks. This looks a bit like there is a factor 2 missing somewhere.
Will have a closer look when I get to the office and a machine with a proper KOKKOS capable GPU.

LorenzoPiersante · September 18, 2025, 2:32pm

Thanks a lot!

akohlmey · September 18, 2025, 4:05pm

Can you please try to run the jobs again but without fix langevin and then post the outputs?

LorenzoPiersante · September 18, 2025, 4:13pm

Hello,

I rerun the input file with the two lines before run commented out. Here you have the results.
The systems just oscillates back and forth as expected. Do you believe there is a problem with Langevin?
temp_no_langevin.pdf (20.7 KB)
log_no_langevin.lammps (102.5 KB)

akohlmey · September 18, 2025, 5:07pm

Not a problem, but you cannot exactly reproduce fix langevin results across different compilations and between plain fixes and KOKKOS fixes because of the different random number sequences.

I get the exact same forces from fix spring/self with and without KOKKOS and in KOKKOS with and without GPU. But as soon as I add fix langevin to the mix results change due to the different pRNG results.

When thermostatting with fix nvt instead of fix nve+langivin, also the results from plain LAMMPS and LAMMPS with KOKKOS GPU or OpenMP are identical (for short trajectories)

stamoor · September 18, 2025, 5:29pm

You can also compile with -DLMP_KOKKOS_DEBUG_RNG to debug fix langevin/kk and it should give the exact same string of random numbers and hence the same trajectory as non-Kokkos, but you must run on a CPU with the Kokkos Serial backend.

LorenzoPiersante · September 19, 2025, 7:37am

Good morning, I understand that the results of Langevin cannot be the same on CPU and GPU due to different pRNG generators. However, I find strange that the CPU implementation ramps up the temperature from 100K to 473K during the run, whereas the KOKKOS version does not. Shouldn’t the fix just carry out a linear scaling of T in the span of a run irrespectively of the platform?

stamoor · September 19, 2025, 2:37pm

Yes, can you make a plot that shows the discrepancy? I will try to take a look at the code and posted test case on Monday too.

akohlmey · September 19, 2025, 3:02pm

It will do the ramp, if you disable the temperature bias. Since your system is stationary, it should not really be needed anyway, but it is worrisome that there is no error or warning. I suspect this may be related to compute temp/com not being ported to KOKKOS yet.

stamoor · September 22, 2025, 7:53pm

I can reproduce and this is a bug in KOKKOS. It works as expected if you turn on CUDA UVM. I haven’t found the cause yet.

LorenzoPiersante · September 24, 2025, 7:21am

I apologize for the slow reply. I have attached some pdfs with the plots to my previous messages, in case you haven’t tried it out yet.

LorenzoPiersante · September 24, 2025, 7:21am

Hello, do you mean that the bug is with temp/com?

stamoor · September 24, 2025, 5:09pm

I think the bug is in fix langevin/kk. There is missing data transfer between GPU and CPU but I haven’t pin pointed the exact location yet.