I am trying to use the makefile.aurora_kokkos that y’all recommende.
The mpicxx that comes as part of my intel compilers is actually a wrapper around gnu g++ and doesn’t understand sycl flags. When I tried to use the intel compiler that actually does understand sycl flags, I got undefined reference at the linking step.
make[1]: Entering directory '[...]/lammps/src/Obj_aurora_kokkos'
mpiicpc -cxx=icpx -g -O3 main.o -L. -llammps_aurora_kokkos -lkokkos -ldl -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -march=sapphirerapids -mtune=sapphirer
apids -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend "-device 12.60.7" -L[...]/lammps/src/Obj_aurora_kokkos -
o ../lmp_aurora_kokkos
bin/ld: [...]/libmkl_intel_thread.so: undefined reference to `__kmpc_atomic_fixed4_rd'
icpx: error: linker command failed with exit code 1 (use -v to see invocation)
I added -lpthread -liomp5 and I got past that. I expect this trouble is because I asked for threaded MKL to support package kspace.
I was able to get the makefile.aurora_kokkos working with some modifications.
I used a version of the compiler that’s as close to what aurora as I could find. I have 2023.1.0, but the executable names are different for some reason. mpicxx is not where it’s at, anymore.
# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler
CC = mpiicpc -cxx=icpx
LINK = mpiicpc -cxx=icpx
LINKFLAGS = -g -O3 -fsycl-link-huge-device-code -fsycl-max-parallel-link-jobs=30
# ---------------------------------------------------------------------
# LAMMPS-specific settings, all OPTIONAL
# MPI library
MPI_LIB = -lpthread -liomp5
# FFT library
FFT_INC = -DFFT_MKL -DFFT_MKL_THREADS
FFT_PATH =
FFT_LIB = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread
Executable works fine for simple case, but when I try to run my Rhodopsin benchmark that uses FFT, I get a segmentation fault. (It works fine in serial mode, too. The crash is just on the GPU. )
Intel link help suggests I need to do it like this to enable GPU offloading -fiopenmp -fopenmp-targets=spir64 -fsycl -L${MKLROOT}/lib/intel64 -lmkl_sycl_undefined -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lsycl -lstdc++ -lpthread -lm -ldl
As Stan indicated earlier, adding MKL FFT support for PPPM is a work-in-progress for the SYCL backend (should be straightforward, but we’ve had fun with other priorities). You could drop the kspace style and switch to something like lj/charmm/coul/charmm if you want to get a sense of how everything else in in.rhodo runs on your local setup.
The Kokkos version of KISS FFT was supposed to be the portable fallback that works with any backend, but as you know it is ironically broken due to SYCL not supporting recursive functions on the device. There is also heFFTe but it isn’t enabled for the KOKKOS package yet. You could run PPPM on the host CPU. So porting MKL is pretty much required at this point.
I found oneapi/mkl/dfti.hpp on my system which provides compute_forward and friends. Would it be sufficient to swap out the include statement and the function calls?
That would be most of the work, but we would also need to get the SYCL queue from Kokkos and pass that into the calls. I don’t think any of this is very hard, and there are some examples provided by Intel, but it just hasn’t made it to the top of our to-do list.
Does this file exist in your oneapi installation? /opt/intel/oneapi/mkl/latest/examples/examples_dpcpp.tgz
It has some examples for 1D FFTs with oneapi/mkl.
OK the dpcpp/dft/source/dp_complex_1d.cpp example is good. I’m asking on Kokkos slack about getting the SYCL queue through Kokkos. Will try to get this working real quick if possible.
I am trying experiments using the build that I have. I found that a behavior with MPI that is confusing. Even though I am using an Intel mpi (not mpich), I get this warning:
I can’t comment on level of support in Intel MPI from the public SDKs. I can say the top portion of kokkos.cpp will need to be updated to detect GPU-awareness and assign devices to MPI ranks (or you can do the MPI-GPU binding yourself).
For gpu-aware testing, I would take a step back and try a simple code to make sure it works in your local setup.
What input-deck are you testing and what are you comparing? multi-core CPU? A100?
Intel MPI is a derivative of MPICH, see Intel® MPI Library. If you are getting segmentation faults with -pk kokkos gpu/aware on then that means your MPI really isn’t GPU aware; but it may just need an env var set. For example export I_MPI_OFFLOAD=1 see: GPU Support and Intel® MPI for GPU Clusters.
We don’t auto-detect I_MPI_OFFLOAD=1 in LAMMPS yet, but setting that plus -pk kokkos gpu/aware on could work.
Another note: using export MPICH_GPU_SUPPORT_ENABLED=1 works on Aurora, Frontier, Polaris, and Perlmutter, but not sure if it is general or specific to those machines which have Cray MPICH?