Building lammps with gpu and kokkos

Dear All,

I’ve been working on this for several of weeks - attempting to build/run lammps with gpu and kokkos
I need kokkos and the gpu to run reaction models with large systems

key words: gpu kokkos lammps build

SYSTEM
OS linux mint 21.1, AMD TR 7900, RTX 3900, Lammps 8Feb 23, Kokkos-4,
NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0
Build cuda_11.7.r11.7/compiler.31294372_0
Cuda works with vmd, gromacs, and maeastro

BACKGROUND
I have installed Kokkos with cmake using
cmake …/ -D Kokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE86=ON -D Kokkos_ENABLE_CUDA_LAMBDA=ON -D Kokkos_ENABLE_OPENMP=yes …/cmake

=============================
with this installation I can compile lammps to run with kokkos with no gpu using
cmake -C …/cmake/presets/most.cmake -D Kokkos_ENABLE_OPENMP=yes -D PKG_KOKKOS=yes -D Kokkos_ENABLE_THREADS=ON …/cmake

and for example
mpirun -np 1 lmp-k -k on t 16 -sf kk -in in.rhodo

==============================
Similarly without the invoking kokkos I can use cmake to run with a gpu
cmake -C …/cmake/presets/most.cmake -D PKG_GPU=on -D GPU_API=cuda -DCMAKE_CUDA_ARCHITECTURES=86 …/cmake
with for example mpirun -np 1 lmp -sf gpu -pk gpu 1 -in in.lj works as expected

ISSUE:
However If try to complile lammps and kokkos to use with a GPU RTX 3090 ( using presets = most or basic )
cmake -C …/cmake/presets/most.cmake -D PKG_GPU=on -D Kokkos_ENABLE_CUDA=yes -D PKG_KOKKOS=yes …/cmake

or with the cmake command
cmake -C …/cmake/presets/most.cmake -D PKG_GPU=on -D GPU_API=cuda -DCMAKE_CUDA_ARCHITECTURES=86 -D PKG_KOKKOS=yes -D Kokkos_ENABLE_CUDA=ON -D Kokkos_ENABLE_THREADS=ON …/cmake

complilation proceeds ( with multiple warnings ) then after 100% is reached and the following error appears:



tmp/ccDhsuWv.s:4323155: Error: symbol fatbinData’ is already defined ( many lines ) /tmp/ccDhsuWv.s:4327364: Error: symbol fatbinData’ is already defined
lto-wrapper: fatal error: /usr/bin/c++ returned 1 exit status
compilation terminated.
lammps build.txt

/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/lmp.dir/build.make:117: lmp] Error 1
make[1]: *** [CMakeFiles/Makefile2:391: CMakeFiles/lmp.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

I can find little information on fatbin Data is already defined.

Cuda was installed via the deb local from Nvidia and I gather that the lto wrapper is critcial

The full output of cmake for lammps is attached.

lammps build.txt

In essence any combination of package inclusion ends in failure I I try to invoke a gpu and Kokkos

any tips would be appreciated.

====
I have the full output for the builds , but as a new user I do not have the privilege of uploading those

   =============  prior GitHup data/suggestions ===========================

akohlm… assigned stanmoore1 Apr 24, 2023
added the kokkos_package label Apr 24, 2023
added this to LAMMPS Bug Reports Apr 24, 2023
moved this to High Priority Bugs in LAMMPS Bug Reports Apr 24, 2023

Contributor
stanmoore1 commented Apr 25, 2023

Hi I have not encountered this error before but somehow link time optimization (LTO) is getting enabled. According to Link-time optimization with CUDA on Linux (-flto) - #2 by wlangdon - CUDA Programming and Performance - NVIDIA Developer Forums, can you try using -Xcompiler -fno-lto and see if that fixes the issue?
stanmoore1
Contributor
commented Apr 25, 2023

Alternatively can you try adding the -dlto flag to enable LTO on device, see https://developer.nvidia.com/blog/improving-gpu-app-performance-with-cuda-11-2-device-lto/ and 811162 – media-libs/opencv-4.5.2-r1 lto-wrapper: fatal error.
stanmoore1 added the cmake label Apr 25, 2023

Oh also a suggestion from XXweinbe2: try using Kokkos nvcc_wrapper as the compiler, something like this:*****************************
-D CMAKE_CXX_COMPILER=$(pwd)/…/lib/kokkos/bin/nvcc_wrapper

Contributor
sta commented Apr 25, 2023

If you aren’t using nvcc_wrapper then that is mostly likely the cause of the issue.
stanmoore1 added the invalid label Apr 26, 2023
moved this from High Priority Bugs to Done in LAMMPS Bug Reports Apr 26, 2023

Contributor
stanmoore1 commented Apr 26, 2023

I’m pretty sure that using nvcc_wrapper will fix your issue. <<<<*************** this did not fix the error ***********

In any case, since this is not a bug in LAMMPS but rather a build question I will close this issue on GitHub. Feel free to continue the discussion on MatSci: LAMMPS - Materials Science Community Discourse if you need more help. Thanks
sta closed this as completed Apr 26, 2023

Auth commented Apr 26, 2023

running kokkos with gpu #3751
Closed
busce004 opened this issue Apr 24, 2023 · 6 comments
Closed
running kokkos with gpu
#3751
busce opened this issue Apr 24, 2023 · 6 comments
Comments
commented Apr 24, 2023

==================== I’ve not rebuilt cuda ============== it functions with other MD programs, VMD, Maestro, gromacs
To use device LTO, add the option -dlto to both the compilation and link commands as shown below. Skipping the -dlto option from either of these two steps affects your results.

Compilation of cuda source files with -dlto option:

nvcc -dc -dlto *.cu
Linking of cuda object files with -dlto option:
nvcc -dlto *.o
Using -dlto option at compile time instructs the compiler to store a high-level intermediate representation (NVVM-IR) of the device code being compiled into the fatbinary. The -dlto option at link time will instruct the linker to retrieve the NVVM IR from all the link objects and merge them together into a single IR and perform optimization on the resulting IR for code generation. Device LTO works with any supported SM arch target.

@pbuscemi did you try adding the -dlto or -fno-lto options?

No I did not, I was under the impression those options were used during the cuda toolbox build and that is functioning normally. How are they implemented ?

pb

I’ve tried using -D CMAKE_CXX_FLAGS=“-O3 -flto -fuse-linker-plugin” \ and D CMAKE_CXX_FLAGS=“-O3 -dlto” as part of the cmake flags . resulting with no new but the same fatbinData already present

I’ve also ‘experimented’ with nvcc -dc -dlto *.cu as part of the compile ( make ) step but, frankly ,do not know how to set the .cu target in general how to implement nvcc -dc -dlto *.cu

for what it is worth… using the packages Kokkos_enable_cuda, Kokkos_arch_ampere, gpu but NOT PKG_KOKKOS=yes results in the fatbinData Error. e.i the issue resides in part with PKG_KOKKOS=yes

Can you try a simpler CMake command? This works for me with GCC 8.3.1, CUDA 11.2.0. Just change the Kokkos_ARCH to match your GPU (ARCH_AMPERE86).

cmake -D CMAKE_CXX_COMPILER=$(pwd)/../lib/kokkos/bin/nvcc_wrapper -D BUILD_MPI=yes -D BUILD_OMP=no -D PKG_KOKKOS=ON -D Kokkos_ARCH_VOLTA70=ON -D Kokkos_ENABLE_CUDA=ON ../cmake

Also can you try using the GNU Makefile build:

cd lammps/src
make yes-kokkos
make -j64 kokkos_cuda_mpi KOKKOS_ARCH=Ampere86

People have built the LAMMPS KOKKOS package for GPUs on many different machines and never encountered this issue that I know of, so it must be some difference in operating system libraries, CMake command, or similar.

Thanks for the response. The OS is Linux mint and it could have some differences. It is not machine specific since the same issue arises on three different machines with mint.

Cmake on those machines works well installing gRomacs so I do not think it is C make

Would you suggest an Ubuntu OS that is known to work?

Regards, Paul

We are currently using Ubuntu 20.04LTS in our continuous integration testing for compiling/testing LAMMPS with GPU support. I also compile LAMMPS regularly with KOKKOS with with CUDA 12.1 included on Fedora 36 for testing.

If you have singularity/apptainer installed, you can build a GPU compatible container similar to what we use, from the definition file in tools/singularity/ubuntu20.04_gpu.def (no need to install a different OS altogether).

Thank you all for the continuing effort in helping me out

I will try the simpler cmake for kokkos later tonight. I’ll also look into the singularity/apptainer approach

in regard to using make: it did not work but the error may be telling. The warning and the error defined below point to the inability to point to the nvcc_wrapper

The warning and error extracted from the listing at the end
================= warning ===============
Warning: Cuda Lambda support was requested but NVCC version is too low.This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.
But I have nvcc --version nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
So an incorrect version of nvcc is/ may be being used.

also
=================== error =========================

Desktop/lammps-8Feb2023/lib/kokkos/bin/nvcc_wrapper: line 640: nvcc: command not found

line 640 is : TMPDIR=${temp_dir} nvcc_command where temp dir is defined in line/1 101: if [[ -z {NVCC_WRAPPER_TMPDIR+x} ]]; then
temp_dir={TMPDIR:-/tmp} else temp_dir={NVCC_WRAPPER_TMPDIR}
fi

================== partial error listing ===============

base) pb@asus:~/Desktop/lammps-8Feb2023/src$ sudo make -j64 kokkos_cuda_mpi KOKKOS_ARCH=Ampere86
[sudo] password f
Gathering installed package information (may take a little while)
make[1]: warning: -j64 forced in submake: resetting jobserver mode.
make[1]: Entering directory ‘/home/pb/Desktop/lammps-8Feb2023/src’
make[1]: ‘lmpinstalledpkgs.h’ is up to date.
Gathering git version information
make[1]: Leaving directory ‘/home/pb/Desktop/lammps-8Feb2023/src’
Compiling LAMMPS for machine kokkos_cuda_mpi
make[1]: warning: -j64 forced in submake: resetting jobserver mode.
make[1]: Entering directory ‘/home/pb/Desktop/lammps-8Feb2023/src/Obj_kokkos_cuda_mpi’
/bin/sh: 1: test: -ge: unexpected operator
/bin/sh: 1: test: -gt: unexpected operator
…/…/lib/kokkos/Makefile.kokkos:732: Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.
cc -O -o fastdep.exe …/DEPEND/fastdep.c
make[1]: Leaving directory ‘/home/pb/Desktop/lammps-8Feb2023/src/Obj_kokkos_cuda_mpi’
make[1]: warning: -j64 forced in submake: resetting jobserver mode.
make[1]: Entering directory ‘/home/pb/Desktop/lammps-8Feb2023/src/Obj_kokkos_cuda_mpi’
/bin/sh: 1: test: -ge: unexpected operator
/bin/sh: 1: test: -gt: unexpected operator
…/…/lib/kokkos/Makefile.kokkos:732: Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.
mpicxx -g -O3 -DNDEBUG -Xcudafe --diag_suppress=unrecognized_pragma -DLAMMPS_GZIP -DLMP_KOKKOS -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1 -DFFT_CUFFT -I./ -I…/…/lib/kokkos/core/src -I…/…/lib/kokkos/containers/src -I…/…/lib/kokkos/algorithms/src -I…/…/lib/kokkos/tpls/desul/include -std=c++14 -arch=sm_86 -I./ -I…/…/lib/kokkos/core/src -I…/…/lib/kokkos/containers/src -I…/…/lib/kokkos/algorithms/src -I…/…/lib/kokkos/tpls/desul/include -c …/main.cpp

Why do you use “sudo” to compile? That is a very, very bad idea. You can easily ruin your whole system with this. There is no need for it. “sudo” will likely sanitize your environment and thus not use the same paths and executables that would be used without.

re Sudo…One of the errors suggested the nvcc_wrapper was not found and I though it might have been a permission issue. Admittedly it was not a particularly stellar move. At least I did not use 777

On the brighter side., from the start ,suggestions/ indications were that the faults were nvcc related. The nvcc_ wrapper is located in three locations on my WS : nvcc_wrapper: /usr/local/bin/nvcc_wrapper, /home/pb/.local/bin/nvcc_wrapper and in the kokkos folder. The first two were in my PATH, I had assumed that Kokkos would use the one in its own folder. Not so. I fought the PATH and the PATH won.

Using
cmake …/ -D CMAKE_CXX_COMPILER=/home/pb/Desktop/kokkos-4/bin/nvcc_wrapper -D Kokkos_ARCH_AMPERE86=ON
-D Kokkos_ENABLE_CUDA=ON -D Kokkos_ENABLE_OPENMP=yes …/cmake

( sorry - can’t seem to control the bold text…)

resulted in a build with he necessary cuda back end:

– Setting default Kokkos CXX standard to 17
– The CXX compiler identification is GNU 11.3.0
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /home/pb/Desktop/kokkos-4/bin/nvcc_wrapper - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Setting build type to ‘RelWithDebInfo’ as none was specified.
– The project name is: Kokkos
– Using internal gtest for testing
– Compiler Version: 11.7.64
– Using -std=c++17 for C++17 standard as feature
– Built-in Execution Spaces:
– Device Parallel: Kokkos::Cuda
– Host Parallel: Kokkos::OpenMP
– Host Serial: NONE

– Architectures:
– AMPERE86
– Found CUDAToolkit: /usr/local/cuda-11.7/include (found version “11.7.64”)
– Looking for C++ include pthread.h
– Looking for C++ include pthread.h - found
– Performing Test CMAKE_HAVE_LIBC_PTHREAD
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
– Found Threads: TRUE
– Found TPLCUDA: TRUE
– Found TPLLIBDL: /usr/include
– Found OpenMP_CXX: -fopenmp (found version “4.5”)
– Found OpenMP: TRUE (found version “4.5”)
– Using internal desul_atomics copy
– Kokkos Devices: OPENMP;CUDA, Kokkos Backends: OPENMP;CUDA
– Configuring done
– Generating done

==== proceeding with cmake for lammps======

cmake …/ -D Kokkos_ARCH_AMPERE86=ON -D Kokkos_ENABLE_CUDA=ON -D Kokkos_ARCH_GPU=yes -D\Kokkos_ENABLE_OPENMP=yes …/cmake

completed build and compiling with warnings but no errors

so pointing to the kokkos nvcc_wrapper <<<<<< removed the error.

However, bench on lj is a work in progress…

mpirun lmp-kg -k on g 1 -sf kk -pk kokkos -in in.lj
LAMMPS (28 Mar 2023)
KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:106)
will use up to 1 GPU(s) per node
terminate called after throwing an instance of ‘std::runtime_error’

I’ll hack at this for a while.

Thank you all again.

@pbuscemi I’m glad you were able to compile. Regarding the runtime error, let us know if you need help debugging. The easiest is probably to use gdb to get a stack trace if it is in CPU code.