Advice for building lammps with SVE support on A64FX

Dear LAMMPS community,

I am hoping to get some advice. I am currently trying to build and tune LLAMPS on an HPC system that uses Fujitsu’s A64FX chip. The main goal here to build with Scalable Vector Extension (SVE) support to take advantage of this CPU architecture.

So far, I’ve successfully built LLAMPS with GCC 11, ARM compilers version 21, and the Fujitsu Compilers (combined with OpenMPI 4.1.1) with the appropriate flags for SVE support.

The resulting lmp_mpi binary runs successfully, but it appears that nothing has been vectorized.

Does anyone know if LLAMPS supports SVE? And if so, could anyone provide some advice on how to get this working?

Here is an example build script (in this case with ARM + OpenMPI):

#!/usr/bin/env bash

CC=mpicc
CXX=mpic++
FC=mpif90

export CFLAGS="-O3 -mcpu=a64fx -armpl"
export CXXFLAGS="${CFLAGS}"
export FCFLAGS="${CFLAGS}"
export FFLAGS="${CFLAGS}"

module load cmake ffmpeg arm-modules/21.1 openmpi/arm21/4.1.1 libpng/gcc/1.6.37

source /lustre/software/kim-api/2.2.1/bin/kim-api-activate

cmake -DCMAKE_INSTALL_PREFIX:PATH=/lustre/software/lammps/arm21/29Sep2021  \
-D BUILD_SHARED_LIBS=on -D PKG_KIM=on -D DOWNLOAD_KIM=no -D PKG_MANYBODY=yes -D PKG_MOLECULE=yes \
-D PKG_RIGID=yes -D BUILD_MPI=yes BUILD_OMP=yes PKG_USER-OMP=yes -D PKG_OPT=yes -D PKG_PYTHON=yes \
-D PKG_COLLOID=yes -D PKG_COMPRESS=yes -D PKG_CORESHELL=yes -D PKG_DIPOLE=yes -D PKG_GRANULAR=yes \
-D PKG_KSPACE=yes -D PKG_POEMS=yes -D PKG_REPLICA=yes -D PKG_PERI=on -D PKG_DRUDE=on -D PKG_BODY=on \
-D PKG_SHOCK=yes -D PKG_SRD=yes -D LAMMPS_MACHINE=mpi -D PKG_REAXFF=on -D PKG_MISC=on \
-D CMAKE_CXX_COMPILER=${CXX} -D CMAKE_C_COMPILER=${CC} -D CMAKE_Fortran_COMPILER=${FC} \
-D CMAKE_CXX_FLAGS="${CXXFLAGS}" -D CMAKE_C_FLAGS="${CFLAGS}" -D CMAKE_Fortran_FLAGS="${FFLAGS}" ../cmake

Any advice is most welcome.
Thanks!
Dave

There is very little functionality in LAMMPS that will benefit from vectorization and specifically very little code that was written with explicit vector directives.

Specifically the main data structures for positions and forces do not easily lend themselves to vectorization (because of the Array of Arrays layout). The OPENMP and the INTEL package try to alleviate this by using typecasting to convert this or an Array of Structs layout. But only the INTEL package has done the significant additional modifications and transformation of the data structures to facilitate substantial vectorization using the Intel compiler.

For the most part it is not worth it unless the vectors are so large that it would make sense to look into using OpenCL via the GPU package.

Okay, thank you very much for your reply!