Lammps with Kokkos using SYCL for Intel PVC GPU

I told cmake to use my MKL library that came with my compilers. Compilation reached all the way to linking before failing.

[100%] Building CXX object CMakeFiles/lmp.dir/sw/hprc/sw/LAMMPS/21Nov2023-intel-2023.07-kokkos/apps/lammps/src/main.cpp.o
icx: warning: argument unused during compilation: '-fgpu-inline-threshold=100000' [-Wunused-command-line-argument]
icx: warning: argument unused during compilation: '-Xsycl-target-frontend -O3' [-Wunused-command-line-argument]
[100%] Linking CXX executable lmp
/sw/eb/sw/binutils/2.40-GCCcore-13.2.0/bin/ld: liblammps.a(input.cpp.o): undefined reference to symbol '_ZTVN10__cxxabiv121__vmi_class_type_infoE@@CXXABI_1.3'
/sw/eb/sw/binutils/2.40-GCCcore-13.2.0/bin/ld: /sw/eb/sw/GCCcore/13.2.0/lib64/libstdc++.so.6: error adding symbols: DSO missing from command line
icx: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/lmp.dir/build.make:106: lmp] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:397: CMakeFiles/lmp.dir/all] Error 2
gmake: *** [Makefile:136: all] Error 2

I noticed that cmake picked my CMAKE_CXX_COMPILER to be icx, rather than icpx, clang++, or dpcpp for example. Should I have picked one of those?
Also, my c std library didn’t come with the intel compilers but instead they use the one that came with the gnu compilers. Did I need to tell cmake that?

Not sure about the linking issue. I only have experience with the OneAPI compiler on Aurora: oneAPI Overview | Argonne Leadership Computing Facility. Here is output from Aurora:

mpicxx --version
Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.x.0.20230131)
Target: x86_64-unknown-linux-gnu

When you do get it to compile, this check will prevent you from running with MKL on the GPU:

We’d need to tweak it so that if using SYCL it does not error out, and probably add a check that the MKL lib does support GPUs. But for now you can just remove it.

I am getting so close. By switching from icx to icpx and adding the -lsycl linker flag, I was able to get the link to succeed.

[ 98%] Linking CXX static library liblammps.a
[100%] Built target lammps
[100%] Building CXX object CMakeFiles/lmp.dir/sw/hprc/sw/LAMMPS/21Nov2023-oneAPI-2023.2-kokkos/apps/lammps/src/main.cpp.o
icpx: warning: argument unused during compilation: '-fgpu-inline-threshold=100000' [-Wunused-command-line-argument]
icpx: warning: argument unused during compilation: '-Xsycl-target-frontend -O3' [-Wunused-command-line-argument]
[100%] Linking CXX executable lmp
[100%] Built target lmp

Executable works fine in serial mode. Executable crashes if I try to use a GPU.

lmp -in in.intel.lj -v N off -k on g 1 -sf k
LAMMPS (21 Nov 2023)
KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:71)
  will use up to 1 GPU(s) per node
Exception: No kernel named _ZTSZN5desul4Impl21init_lock_arrays_syclIiEEvN4sycl3_V15queueEEUlvE_ was found -46 (PI_ERROR_INVALID_KERNEL_NAME)
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Kokkos::Experimental::SYCL ERROR: Failed to call Kokkos::Experimental::SYCL::finalize()

Kokkos developers are suggesting there is a problem with the way the application is getting linked.

I suspect the issue is because not using full paths to the Kokkos libraries (where all the device code is located). This is a temporary issue that’s being worked on.

You could do a quick test by copying the link line that cmake executes, confirm that -l was used for both Kokkos libraries, and replace those references with full paths to the Kokkos libraries.

1 Like

[...]/icpx -Wall -Wextra -g -O2 -DNDEBUG -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker [...]/release -Xlinker -rpath -Xlinker [...]/lib "[...]/main.cpp.o" -o lmp liblammps.a [...]/libmpicxx.so [...]/libmpifort.so [...]/libmpi.so /lib64/libdl.so /lib64/librt.so /lib64/libpthread.so [...]/libmkl_rt.so -lm lib/kokkos/containers/src/libkokkoscontainers.a lib/kokkos/core/src/libkokkoscore.a lib/kokkos/simd/src/libkokkossimd.a -ldl

The lammps libraries are being referred to by relative path (not -l)

I was able to reproduce this locally with cmake and am debugging. Building with the Makefile.aurora_kokkos in the meantime should let you make forward progress.

I am considering using a newer intel compiler, which requires a newer kokkos. I found that LAMMPS doesn’t bundle the appropriate kokkos on any release but I found the PR for it.

Update Kokkos library bundled in LAMMPS to v4.2 #3983

Merged
akohlmey merged 3 commits into lammps:develop from stanmoore1:kk_update_4.2 on Nov 24, 2023

Is this safe to use?

Update: no, it is not safe to use. There was an improperly defined printf() somewhere that intel didn’t like. The whole develop branch seems to have this issue.

Please see the explanations on this page that tries to explain how to locate the different “versions” of LAMMPS available in the git repository and the rationale behind them.
https://docs.lammps.org/latest/Manual_version.html

LAMMPS follows a “continuous release” policy that tries to keep the “develop” branch fully functional. Most issues are detected by the testing and code review of pull requests before merging into the “develop” branch. This is not perfect but quite good in most cases.

The LAMMPS developers tend to stay close to the head to the “develop” branch all the time and use that in order to detect issues. We also use a variety of code analysis tools that will eventually report some overlooked issues.

Feature releases are safer to use, stable releases even more so (most bugs are introduced with new features, and we back port bugfixes to the stable release until the next stable release), but those are falling behind the bleeding edge and if you need the bleeding edge, there is no alternative to follow the “develop” branch.

Can you be more specific on this?

Kokkos 4.2 has a known issue compiling with latest sycl that was fixed for Make but not fixed for CMake. This does not seem to be a LAMMPS issue.

Which issue? How was it fixed? There is no difference in the source code between the GNU make build and the CMake build of LAMMPS.

I am less confident about my diagnosis. What I know is that I am running into a problem that is supposed to be fixed.

error: experimental::printf requires format string to reside in constant address space. The compiler wasn't able to automatically convert your format string into constant address space when processing builtin _ZN4sycl3_V13ext6oneapi12experimental6printfIcJPKcEEEiPKT_DpT0_ called in function _ZN6Kokkos6printfIJPKcEEEvS2_DpT_.
Consider simplifying the code by passing format strings directly into experimental::printf calls, avoiding indirection via wrapper function arguments.
7 warnings and 1 error generated.
make[2]: *** [CMakeFiles/lammps.dir/build.make:7286: CMakeFiles/lammps.dir/[...]/lammps/src/KOKKOS/comm_kokkos.cpp.

update, nope still confused

You could try applying this change from upstream to lib/kokkos/core/src/Kokkos_Printf.hpp

  diff --git a/core/src/Kokkos_Printf.hpp b/core/src/Kokkos_Printf.hpp
  index 39f95825c..af20221a5 100644
  --- a/core/src/Kokkos_Printf.hpp
  +++ b/core/src/Kokkos_Printf.hpp
  @@ -31,7 +31,7 @@ namespace Kokkos {
   // backends. The GPU backends always return 1 and NVHPC only compiles if we
   // don't ask for the return value.
   template <typename... Args>
  -KOKKOS_FUNCTION void printf(const char* format, Args... args) {
  +KOKKOS_FORCEINLINE_FUNCTION void printf(const char* format, Args... args) {
   #ifdef KOKKOS_ENABLE_SYCL
     // Some compilers warn if "args" is empty and format is not a string literal
     if constexpr (sizeof...(Args) == 0)
1 Like

Yes, using upstream kokkos corrected that problem. However, it did not correct my application failure. The application still crashes if I try to use a GPU.

lmp -in in.intel.lj -k on g 1 -sf kk 
LAMMPS (21 Nov 2023)
KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:71)
  will use up to 1 GPU(s) per node
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
Created orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848)
  1 by 1 by 1 MPI processor grid
Created 512000 atoms
  using lattice units in orthogonal box = (0 0 0) to (134.3677 67.183848 67.183848)
  create_atoms CPU = 0.149 seconds
Exception: The program was built for 1 devices
Build program log for 'Intel(R) Data Center GPU Max 1100':
(many lines similar to the following:)
error : unresolved external symbol _ZN9LAMMPS_NS17ComputeTempKokkosIN6Kokkos12Experimental4SYCLEE7s_CTEMPC1Ev at offset 27796 in instructions segment #7 (aka kernel : _ZTSZZNK6Kokkos4Impl14ParallelReduceINS0_22CombinedFunctorReducerIN9LAMMPS_NS17ComputeTempKokkosINS_12Experimental4SYCLEEENS0_15FunctorAnalysisINS0_23FunctorPatternInterface6REDUCEENS_11RangePolicyIJS6_NS3_20TagComputeTempVectorILi0EEEEEES7_NS7_7s_CTEMPEE7ReducerEvEESE_S6_E18sycl_direct_launchISE_NS5_4Impl19SYCLFunctionWrapperISI_NSL_12SYCLInternal12USMObjectMemILN4sycl3_V13usm5allocE0EEELb0EEEEENSQ_5eventERKT_RKT0_RKSV_ENKUlRNSQ_7handlerEE_clES15_EUlvE_)
 -42 (PI_ERROR_INVALID_BINARY)

Are you just using the /bench/in.rhodo input, or something else?

this one is src/INTEL/TESTS/in.intel.lj
i have previously run this one on PVC gpus (but not under kokkos)

I still don’t understand how to use Makefile.aurora_kokkos
I am not on aurora; my compilers aren’t installed in the same directories for example.

my compilers aren’t installed in the same directories for example

This shouldn’t matter. As long as you have a working mpicxx then it should work. The only hard-coded paths would be for the FFT library. And you could grab those from the CMake output.

FFT_INC =
FFT_PATH =
FFT_LIB =