LAMMPS release 17 April 2024 much slower than an older release

Hi. For the following script, the LAMMPS version coming with Ubuntu 22.04 (namely, LAMMPS 29 Sep 2021 - Update 2) is more than two times faster than LAMMPS 17 April 2024. Why? For the latter, I compiled LAMMPS from source code with cmake. I also tried with pre-compiled binaries for Linux. The results were the same. I ran with one process and one thread in all runs. The CPU is Xeon E5-2680 v4. Thanks in advance.

# Lennard-Jones Argon

units		metal
atom_style	atomic
boundary    p p p

lattice		fcc 5.26
region		box block 0 10 0 10 0 10
create_box	1 box
create_atoms	1 box
mass		1 39.948

velocity	all create 100 87287

# LJ 12-6 potential
pair_style	lj/cut 8.5
pair_coeff	1 1 0.0104 3.4

neighbor	    0.3 bin
neigh_modify	every 1 delay 0 check no

variable    dt equal 0.002  # ps
variable    t equal step*v_dt  # ps

thermo		1
thermo_style custom v_t temp etotal

timestep    ${dt}
fix		    rlx all nve #temp 100 100 0.1
fix         nvt all temp/berendsen 100 100 0.02
run		    500

Do you see the same slowdown also with other inputs, e.g. the ones in the LAMMPS source distribution “bench” folder?

Which build type have you chosen to compile LAMMPS when configuring with CMake?
Try compiling with -DCMAKE_BUILD_TYPE=Release

I hadn’t specified a build type. CMake had chosen RelWithDebInfo automatically. The performance is now good after compiling with -DCMAKE_BUILD_TYPE=Release.

Thanks for the feedback. This is an unusual slowdown you are seeing for a rather minor change in the actual compilation option. On my laptop for example, your input runs at essentially the same speed for both cases.

I suspect the difference you see is due to something exceeding the CPU cache in one case and fitting into it in the other case. My laptop’s CPU has likely a larger CPU cache than your CPU and thus is not affected. Here is the relevant output from the lscpu command from my laptop:

Caches (sum of all):      
  L1d:                    192 KiB (4 instances)
  L1i:                    128 KiB (4 instances)
  L2:                     5 MiB (4 instances)
  L3:                     8 MiB (1 instance)