Diverging trajectories

Following the previous bug report on fix_vector.cpp, I restarted a simulation with the patched version of LAMMPS (22 Jul 2025 - Update 4) and found that the equilibrium state of an equilibrated sample quickly moved to a different state. See the evolution of the density:

The orange line is the trajectory computed with the patched version (update 4), while the green and blue lines are computed with LAMMPS (22 Jul 2025).

The orange and green simulations come from identical input files and use the same restart file. They are just executed on two different versions of LAMMPS (22Jul25 and 22Jul25_update4). The two LAMMPS executables are compiled with the same compiler and the simulations carried out on the same cluster. I also tested the published 22Jul25_update4 without patch, and it gives the same results as the orange trajectory, so the difference is not due to the patch discussed in the previous post.

I am reporting this behaviour because this is a pretty vanilla setup with very established potential functions, so I wasn’t expecting any dramatic difference between two very recent LAMMPS versions (as I wouldn’t doubt simulations carried out 20 years ago with LAMMPS using the same potentials). Anyway, when I compare the starting point with the two executables, this is what I see:

@@ -1,3 +1,3 @@
-LAMMPS (22 Jul 2025 - Update 4)
+LAMMPS (22 Jul 2025)
   using 1 OpenMP thread(s) per MPI task
 package intel 1
@@ -175,7 +175,7 @@
       pair build: half/bin/newton/tri/intel
       stencil: half/bin/3d/tri/intel
       bin: intel
-Per MPI rank memory allocation (min/avg/max) = 8.496 | 8.583 | 8.774 Mbytes
+Per MPI rank memory allocation (min/avg/max) = 8.26 | 8.346 | 8.531 Mbytes
    Step         TotEng         E_vdwl         E_coul         E_long         E_bond        E_angle        E_dihed        E_impro         KinEng         PotEng          Temp          Press          Volume        Density        v_msqdis        v_diff          CPU      
-   4000000  -189880.6       16398.021     -114802.43     -93860.196      639.61688      167.74865      0              0              1576.64       -191457.24      298.99946     -6681.2718      21698.441      2.060053       0              3.3333333e+15  0            
-   4005000  -189929.55      16872.598     -115377.21     -93854.541      655.80918      165.51629      0              0              1608.2721     -191537.82      304.99829     -466.14381      21136.17       2.1148551      0.74800541     2.4933514e-05  6.8582337    
+   4000000  -190009.31      16398.021     -114802.42     -93860.196      520.80711      157.84788      0              0              1576.64       -191585.95      298.99946     -6681.2697      21698.441      2.060053       0              3.3333333e+15  0            
+   4005000  -190052.61      16498.483     -114954.62     -93856.802      538.06563      151.57617      0              0              1570.6846     -191623.3       297.87006     -5220.7239      21617.197      2.0677953      0.48500297     1.6166766e-05  3.1489684

The takeaways:

  • The pairwise energy contributions (E_vdwl, E_coul, E_long) are the same. The virial must also be the same, as the pressure is idential at step 4000000.
  • The E_bond and E_angle are different.
  • The density in 22Jul25_update4 increases quite rapidly.

I am not sure what drives this change in the two trajectories, but it’s not caused by the patch to fix vector. If you want to reproduce this simulation, please use the input deck in this post.

Thank you,

Otello

I cannot reproduce this.

I took your posted restart and input, stripped the input to the attached version and ran it with pre-compiled static LAMMPS binaries from the GitHub release page and my current development branch binary with all recent bugfixes included and get identical energies and pressures (except for small expected differences due to the different FFT library in my development binary):

tobermorite_s02_3.in (3.5 KB)

log.develop (9.0 KB)
log.update4 (9.2 KB)
log.stable (9.2 KB)

Hi Axel, thank you for confirming. It’s actually good news! It means that there must be some difference in the executables I am using on that HPC cluster –I guess it boils down to the combination of modules loaded and compilers used. I will try other combinations of modules/compilers, check against the same restart you used, and report back.

I would not rule out something simpler like having some differences in your input

I suggest to try the lammps-linux static binaries as reference. They are compiled with compilers know to work and do not use aggressive optimizations. Then you can compare your local executables against that. And they are fully static (including the C library), so no local installation or module or library should make a difference.

1 Like

I managed to reproduce the behaviour with a freshly compiled LAMMPS update4, so the difference I observe seems to originate from the executable. The executables compiled with the GNU compiler produce the same results of your tests, including the static binary executed on the same platform (that’s kind of obvious).

This is the configuration of the offending executable:

cd build

module purge
module load intel-oneapi/2024.2
module load intel-mpi/oneapi/2021.13
module load intel-mkl/2024.2

cmake3 -D CMAKE_INSTALL_PREFIX=$HOME/.local \
-D CMAKE_BUILD_TYPE=Release \
-D LAMMPS_MACHINE=intel \
-D ENABLE_TESTING=no \
-D BUILD_OMP=yes \
-D BUILD_MPI=yes \
-D CMAKE_CXX_COMPILER=icpx \
-D CMAKE_CXX_FLAGS_RELEASE="-Ofast -xHost -qopenmp -DNDEBUG" \
-D PKG_MOLECULE=yes -D PKG_RIGID=yes -D PKG_MISC=yes \
-D PKG_KSPACE=yes -D FFT=MKL -D FFT_SINGLE=yes -D EXTRA_DUMP=yes \
-D PKG_INTEL=yes -D INTEL_ARCH=cpu -D INTEL_LRT_MODE=threads ../cmake

The log of the simulation follows:

tobermorite_s02_3.log5 (10.4 KB)

The simulation was run with the -sf intel flag. So maybe the bug is in one of the intel variants of the bond and angle potentials.

… or a bug in the Intel compiler miscompiling code. I have stopped trying to test LAMMPS
with the Intel compilers since outside of the INTEL package (which is unmaintained since
Mike Brown left Intel for Nvidia) there is no benefit over GCC or Clang (a friend of mine
called the LLVM based new Intel compilers as “Clang but with more and more interesting bugs”).

You should be able to run your executable compiled with Intel icpx but without -sf intel
and also compile the INTEL package with gcc and then run with -sf intel.
The result of that should provide a strong hint where to look closer.

thanks for the explanation, I wasn’t aware of the bad status of intel compilers. It’s interesting that the execution time over GCC is almost half –but bogus results is too high a price to pay.

That is because with non-Intel compilers, the explicit directive based vectorization support in the INTEL package is disabled. So the difference you see is what you gain from vectorization.

1 Like

It is a bit of a lottery. Some versions are more reliable than others and it depends on what features in LAMMPS you use.