This is just a standard LJ potential, but uses the implementation of OpenKIM.
Please try running the in.lj benchmark input for comparison form the bench folder, but run it with:
mpiexec -n 1 lmp -in in.lj.lmp -v x 2 -v y 2 -v z 2
and then vary the number of processes in the sequence 1 2 4 8 16.
On my desktop running Linux with an AMD Ryzen 7 7840HS CPU (8 physical cores with hyperthreading) I get:
Loop time of 7.01462 on 1 procs for 100 steps with 256000 atoms
Performance: 6158.570 tau/day, 14.256 timesteps/s, 3.650 Matom-step/s
99.6% CPU use with 1 MPI tasks x 1 OpenMP threads
--
Loop time of 3.64519 on 2 procs for 100 steps with 256000 atoms
Performance: 11851.224 tau/day, 27.433 timesteps/s, 7.023 Matom-step/s
99.5% CPU use with 2 MPI tasks x 1 OpenMP threads
--
Loop time of 1.98606 on 4 procs for 100 steps with 256000 atoms
Performance: 21751.593 tau/day, 50.351 timesteps/s, 12.890 Matom-step/s
99.4% CPU use with 4 MPI tasks x 1 OpenMP threads
--
Loop time of 1.18628 on 8 procs for 100 steps with 256000 atoms
Performance: 36416.382 tau/day, 84.297 timesteps/s, 21.580 Matom-step/s
99.3% CPU use with 8 MPI tasks x 1 OpenMP threads
--
Loop time of 0.939385 on 16 procs for 100 steps with 256000 atoms
Performance: 45987.518 tau/day, 106.453 timesteps/s, 27.252 Matom-step/s
98.7% CPU use with 16 MPI tasks x 1 OpenMP threads
To get more than 100% CPU usage, you need to use multi-threading, in this case through setting the environment variable OMP_NUM_THREADS to 2 and adding the -sf omp
flag (the lj/cut style in the lj benchmark is supported by the OPENMP package). With 1 to 8 MPI ranks this now looks like this:
Loop time of 3.30538 on 2 procs for 100 steps with 256000 atoms
Performance: 13069.608 tau/day, 30.254 timesteps/s, 7.745 Matom-step/s
199.0% CPU use with 1 MPI tasks x 2 OpenMP threads
--
Loop time of 1.79998 on 4 procs for 100 steps with 256000 atoms
Performance: 24000.205 tau/day, 55.556 timesteps/s, 14.222 Matom-step/s
198.8% CPU use with 2 MPI tasks x 2 OpenMP threads
--
Loop time of 1.10091 on 8 procs for 100 steps with 256000 atoms
Performance: 39240.313 tau/day, 90.834 timesteps/s, 23.254 Matom-step/s
198.3% CPU use with 4 MPI tasks x 2 OpenMP threads
--
Loop time of 1.04419 on 16 procs for 100 steps with 256000 atoms
Performance: 41371.734 tau/day, 95.768 timesteps/s, 24.517 Matom-step/s
186.5% CPU use with 8 MPI tasks x 2 OpenMP threads
Thus, since you are not using an OPENMP supported pair style, you cannot reach more than 100% (within the accuracy of CPU usage accounting). If you get less than that, it may mean that you may have a load imbalance (the information printed from MPI rank 0 only) or there are some other (system) processes using the CPU (always possible on Windows).