i keep getting kokkos configuring with KISS instead of cufft for cuda build.
there’s a legacy Makefile setting FFT_INC = -DFFT_CUFFT, FFT_LIB = -lcufft but there’s no cmake equivalent afaik.
docs say “This will also enable executing FFTs on the GPU, either via the internal KISSFFT library, or - by preference - with the cuFFT library bundled with the CUDA toolkit, depending on whether CMake can identify its location.”
i can see libcufft in LIBRARY_PATH but not LD_LIBRARY_PATH. maybe that’s why autodetection doesnt work ? maybe i need to fix my cluster install of cuda with explicit LD_LIBRARY_PATH ?? that’s what im trying next
build-pascal60.out (103.8 KB)
i looked at KOKKOS.cmake, cant figure it out
a few warnings bother me:
– Checking for module ‘fftw3’
– Package ‘fftw3’, required by ‘virtual:world’, not found
– CUDA auto-detection of architecture failed with /cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/x86_64-pc-linux-gnu/gcc-bin/12/c++. Enabling CUDA language ONLY to auto-detect architecture…
– Check for working CUDA compiler: /cvmfs/restricted.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/nvhpc/23.9/Linux_x86_64/23.9/compilers/bin/nvcc - skipped
/home/~~~/scratch/lammps/src/KOKKOS/fft3d_kokkos.cpp(44): warning #177-D: variable “ngpus” was declared but never referenced
int ngpus = lmp->kokkos->ngpus;
^
/home/~~~/scratch/lammps/src/KOKKOS/fft3d_kokkos.cpp(45): warning #177-D: variable “execution_space” was declared but never referenced
ExecutionSpace execution_space = ExecutionSpaceFromDevice::space;
^
ptxas warning : Stack size for entry function ‘ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI16kiss_fft_functorINS_4CudaEENS_11RangePolicyIJS4_EEES4_EEEEvT’ cannot be statically determined
next im starting to fiddle with KokkosTools as suggested by @stamoor to see if i can learn more about optimizing my application, and the smallest number of cpu/gpu to fit my problem size without needing UVM.
614751 atoms
419078 bonds
243891 angles
81972 dihedrals
3524 impropers
974 crossterms
72 atom types
123 bond types
267 angle types
550 dihedral types
25 improper types
Also should i use -D Kokkos_ARCH_HOSTARCH=NATIVE or -D Kokkos_ARCH_HOSTARCH=BDW ? does it make a difference ??
im building lammps-pascal60 on a compute node with 1 gpu and 24 cpus to replicate the deployment environment as close as possible and eliminate cross-compiling as a source of issues, testing on 4/4 cpu/gpu to validate multi-node runs (ie where the gpus are not all on the same node which have 2 gpus/node), then im gonna scale to a range of 16/16 to 64/64 cpus/gpus (8-32 nodes) in production runs.