I have gotten significant speed-ups on Skylake CPUs with USER-INTEL. However, I’m finding the performance on Cascade Lake CPUs is about 40% slower for a Lennard-Jones fluid and a large solvated peptide system. These are single node simulations on different Linux clusters running RHEL 7.8 and Slurm.
The host processor on the login node is Broadwell. Here is my CMake build (LAMMPS adds -xHost -qopenmp -restrict):
wget https://github.com/lammps/lammps/archive/patch_4Feb2020.tar.gz
module purge
module load intel/18.0/64/18.0.3.222
module load intel-mpi/intel/2018.3/64
cmake3 -D CMAKE_INSTALL_PREFIX=$HOME/.local -D LAMMPS_MACHINE=perseus_uintel -D ENABLE_TESTING=yes
-D BUILD_MPI=yes -D BUILD_OMP=yes -D CMAKE_CXX_COMPILER=icpc -D CMAKE_BUILD_TYPE=Release
-D CMAKE_CXX_FLAGS_RELEASE="-Ofast -axCORE-AVX512 -DNDEBUG"
-D PKG_USER-OMP=yes -D PKG_MOLECULE=yes -D PKG_RIGID=yes -D PKG_MISC=yes
-D PKG_KSPACE=yes -D FFT=MKL -D FFT_SINGLE=yes
-D PKG_USER-INTEL=yes -D INTEL_ARCH=cpu -D INTEL_LRT_MODE=threads …/cmake
make -j 10
make test
make install
Here is a different build using make:
wget https://github.com/lammps/lammps/archive/stable_3Mar2020.tar.gz
module load intel/19.1/64/19.1.1.217
module load intel-mpi/intel/2019.7/64
SHELL = /bin/sh
CC = mpicxx -std=c++11
OPTFLAGS = -xCORE-AVX512 -O3 -fp-model fast=2 -no-prec-div -qoverride-limits
-qopt-zmm-usage=high
CCFLAGS = -qopenmp -qno-offload -ansi-alias -restrict
-DLMP_INTEL_USELRT -DLMP_USE_MKL_RNG (OPTFLAGS) \
-I(MKLROOT)/include
SHFLAGS = -fPIC
DEPFLAGS = -M
LINK = mpicxx -std=c++11
LINKFLAGS = -qopenmp (OPTFLAGS) -L(MKLROOT)/lib/intel64/
LIB = -ltbbmalloc -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
SIZE = size
ARCHIVE = ar
ARFLAGS = -rc
SHLIBFLAGS = -shared
Here is a sample Slurm script:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --mem=10G
#SBATCH --time=00:02:00
module load intel/19.1/64/19.1.1.217
module load intel-mpi/intel/2019.7/64
srun $HOME/.local/bin/lmp_cascade -sf omp -sf intel -in in.melt
Any thoughts on why the newer generation Intel processors are not performing here?
Jon