How to enable cufft in kokkos cuda build?

i keep getting kokkos configuring with KISS instead of cufft for cuda build.

there’s a legacy Makefile setting FFT_INC = -DFFT_CUFFT, FFT_LIB = -lcufft but there’s no cmake equivalent afaik.

docs say “This will also enable executing FFTs on the GPU, either via the internal KISSFFT library, or - by preference - with the cuFFT library bundled with the CUDA toolkit, depending on whether CMake can identify its location.”

i can see libcufft in LIBRARY_PATH but not LD_LIBRARY_PATH. maybe that’s why autodetection doesnt work ? maybe i need to fix my cluster install of cuda with explicit LD_LIBRARY_PATH ?? that’s what im trying next

build-pascal60.out (103.8 KB)

i looked at KOKKOS.cmake, cant figure it out

a few warnings bother me:

– Checking for module ‘fftw3’
– Package ‘fftw3’, required by ‘virtual:world’, not found

– CUDA auto-detection of architecture failed with /cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/x86_64-pc-linux-gnu/gcc-bin/12/c++. Enabling CUDA language ONLY to auto-detect architecture…

– Check for working CUDA compiler: /cvmfs/restricted.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/nvhpc/23.9/Linux_x86_64/23.9/compilers/bin/nvcc - skipped

/home/~~~/scratch/lammps/src/KOKKOS/fft3d_kokkos.cpp(44): warning #177-D: variable “ngpus” was declared but never referenced
int ngpus = lmp->kokkos->ngpus;
^

/home/~~~/scratch/lammps/src/KOKKOS/fft3d_kokkos.cpp(45): warning #177-D: variable “execution_space” was declared but never referenced
ExecutionSpace execution_space = ExecutionSpaceFromDevice::space;
^

ptxas warning : Stack size for entry function ‘ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI16kiss_fft_functorINS_4CudaEENS_11RangePolicyIJS4_EEES4_EEEEvT’ cannot be statically determined

next im starting to fiddle with KokkosTools as suggested by @stamoor to see if i can learn more about optimizing my application, and the smallest number of cpu/gpu to fit my problem size without needing UVM.

            614751  atoms
            419078  bonds
            243891  angles
             81972  dihedrals
              3524  impropers
               974  crossterms

                72  atom types
               123  bond types
               267  angle types
               550  dihedral types
                25  improper types

Also should i use -D Kokkos_ARCH_HOSTARCH=NATIVE or -D Kokkos_ARCH_HOSTARCH=BDW ? does it make a difference ??

im building lammps-pascal60 on a compute node with 1 gpu and 24 cpus to replicate the deployment environment as close as possible and eliminate cross-compiling as a source of issues, testing on 4/4 cpu/gpu to validate multi-node runs (ie where the gpus are not all on the same node which have 2 gpus/node), then im gonna scale to a range of 16/16 to 64/64 cpus/gpus (8-32 nodes) in production runs.

It is possible, see [BUG] Unable to enable hipFFT-backend with KSPACE + Kokkos with cmake · Issue #3775 · lammps/lammps · GitHub. However, it is much better with this pending PR: KSPACE: decouple KOKKOS and non-KOKKOS FFT by hagertnl · Pull Request #4007 · lammps/lammps · GitHub, do you want to be a beta tester? :grinning:

sure i enjoy being cannon fodder at the bleeding edge. one of my favorite tv shows lines ever is from chief boden on chicago fire: “leaders lead from the front”.

ill download the patch from the PR and build it, see what happens. what should i be looking for and how should i be testing ? im not familiar with unit testing procedures in lammps package(s).

anytime you want me to beta test or even alpha test anything related to kokkos let me know.

have you ever considered using vkfft github package, seems pretty mature and well maintained.

vkfft does support Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal, ive noticed quite a few github issues on kokkos and other other lammps packages with HIP

It adds a new CMake FFT_KOKKOS option, which would be set to CUFFT. Just see if it gives the expected behavior with cuFFT.

See KSPACE: decouple KOKKOS and non-KOKKOS FFT by hagertnl · Pull Request #4007 · lammps/lammps · GitHub.

Re: VkFFT, it could be added in theory but we already have native cuFFT, and also heFFTe support so the code it is already pretty complex; there would be need to be a compelling reason, e.g. it is much faster than existing options. There is also talk of a Kokkos wrapper for FFTs which could simplify everything.

im trying to learn how to merge two github pull requests from two different forks

into my own so i can build it for my application which absolutely needs the CMAP fixes from @akohlmey otherwise my protein-dna simulation doesnt work.

up to now ive been doing

git clone -b cmap-fixes-for-charmm-gui https://github.com/akohlmey/lammps.git

but it wont work with PR 4007 also at the same time.

sorry im new to github so im googling right now to see how i can merge both those PRs into my own alphataubio fork then clone that to the cluster

maybe you can give me some hints …

I add each repository as a “remote”: Git - git-remote Documentation

git remote add hagertnl [email protected]:hagertnl/lammps-fork.git
git fetch hagertnl
git pull hagertnl issue3775_fft_kokkos

im getting "CUDA driver is a stub library " but this might be a local problem with my build environment, LMOD modules are extra flimsy here on the clusters.

details on github PR 4007

"CUDA driver is a stub library "

Yes this means your CUDA install is borked in my experience.

Nod. It is surprising how many people in our business don’t know the difference between LIBRARY_PATH and LD_LIBRARY_PATH. … or even know of the former.

i know about LD_LIBRARY_PATH, in fact i tried really hard for 2-3 days to get nvhpc module (both 22.x and 23.x versions) from nvidia working because i was so interested in the cuda-aware openmpi bundled with it. the odd thing was it wasn’t seeing the cuda runtime even if all the .so’s were in both LIBRARY_PATH and LD_LIBRARY_PATH.

in the end, i ended up just loading cmake, cuda, gcc modules, then building openmpi-5.0.1 from source configured with cuda on the login node with gpus attached on power9/v100 cluster, and a compute node with gpus on another intel/p100 cluster:

wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.1.tar.gz
tar xzvf openmpi-5.0.1.tar.gz; cd openmpi-5.0.1
module purge; module load MistEnv/2021a cmake/3.21.4 cuda/11.7.1 gcc/11.4.0

./configure --prefix=$HOME/scratch/local/openmpi-5.0.1 --with-cuda=$CUDA_HOME --with-cuda-libdir=$CUDA_HOME/lib64/stubs; make -j 64 all; make install

this is the modulefile i wrote and put in $HOME/local/modules/openmpi/5.0.1

#%Module
set prefix {~/scratch/local/openmpi-5.0.1}
set version {5.0.1}
prepend-path CMAKE_PREFIX_PATH ${prefix}
prepend-path PATH ${prefix}/bin
prepend-path CPATH ${prefix}/include
prepend-path LIBRARY_PATH ${prefix}/lib
prepend-path LD_LIBRARY_PATH ${prefix}/lib
prepend-path MANPATH ${prefix}/share/man
prepend-path PKG_CONFIG_PATH ${prefix}/lib/pkgconfig
setenv MODULE_OPENMPI_PREFIX ${prefix}
prepend-path MODULEPATH ~/scratch/local/openmpi-5.0.1

https://lmod.readthedocs.io/en/latest/015_writing_modules.html

module use $HOME/local/modules
module purge; module load MistEnv/2021a cuda/11.7.1 gcc/11.4.0 openmpi/5.0.1
cmake …
lmp …