How to enable cufft in kokkos cuda build?

alphataubio · December 22, 2023, 10:21pm

i keep getting kokkos configuring with KISS instead of cufft for cuda build.

there’s a legacy Makefile setting FFT_INC = -DFFT_CUFFT, FFT_LIB = -lcufft but there’s no cmake equivalent afaik.

docs say “This will also enable executing FFTs on the GPU, either via the internal KISSFFT library, or - by preference - with the cuFFT library bundled with the CUDA toolkit, depending on whether CMake can identify its location.”

i can see libcufft in LIBRARY_PATH but not LD_LIBRARY_PATH. maybe that’s why autodetection doesnt work ? maybe i need to fix my cluster install of cuda with explicit LD_LIBRARY_PATH ?? that’s what im trying next

build-pascal60.out (103.8 KB)

i looked at KOKKOS.cmake, cant figure it out

a few warnings bother me:

– Checking for module ‘fftw3’
– Package ‘fftw3’, required by ‘virtual:world’, not found

– CUDA auto-detection of architecture failed with /cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/x86_64-pc-linux-gnu/gcc-bin/12/c++. Enabling CUDA language ONLY to auto-detect architecture…

– Check for working CUDA compiler: /cvmfs/restricted.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/nvhpc/23.9/Linux_x86_64/23.9/compilers/bin/nvcc - skipped

/home/~~~/scratch/lammps/src/KOKKOS/fft3d_kokkos.cpp(44): warning #177-D: variable “ngpus” was declared but never referenced
int ngpus = lmp->kokkos->ngpus;
^

/home/~~~/scratch/lammps/src/KOKKOS/fft3d_kokkos.cpp(45): warning #177-D: variable “execution_space” was declared but never referenced
ExecutionSpace execution_space = ExecutionSpaceFromDevice::space;
^

ptxas warning : Stack size for entry function ‘ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI16kiss_fft_functorINS_4CudaEENS_11RangePolicyIJS4_EEES4_EEEEvT’ cannot be statically determined

next im starting to fiddle with KokkosTools as suggested by @stamoor to see if i can learn more about optimizing my application, and the smallest number of cpu/gpu to fit my problem size without needing UVM.

            614751  atoms
            419078  bonds
            243891  angles
             81972  dihedrals
              3524  impropers
               974  crossterms

                72  atom types
               123  bond types
               267  angle types
               550  dihedral types
                25  improper types

Also should i use -D Kokkos_ARCH_HOSTARCH=NATIVE or -D Kokkos_ARCH_HOSTARCH=BDW ? does it make a difference ??

im building lammps-pascal60 on a compute node with 1 gpu and 24 cpus to replicate the deployment environment as close as possible and eliminate cross-compiling as a source of issues, testing on 4/4 cpu/gpu to validate multi-node runs (ie where the gpus are not all on the same node which have 2 gpus/node), then im gonna scale to a range of 16/16 to 64/64 cpus/gpus (8-32 nodes) in production runs.

stamoor · December 22, 2023, 10:34pm

It is possible, see [BUG] Unable to enable hipFFT-backend with KSPACE + Kokkos with cmake · Issue #3775 · lammps/lammps · GitHub. However, it is much better with this pending PR: KSPACE: decouple KOKKOS and non-KOKKOS FFT by hagertnl · Pull Request #4007 · lammps/lammps · GitHub, do you want to be a beta tester?

alphataubio · December 22, 2023, 10:43pm

sure i enjoy being cannon fodder at the bleeding edge. one of my favorite tv shows lines ever is from chief boden on chicago fire: “leaders lead from the front”.

ill download the patch from the PR and build it, see what happens. what should i be looking for and how should i be testing ? im not familiar with unit testing procedures in lammps package(s).

anytime you want me to beta test or even alpha test anything related to kokkos let me know.

alphataubio · December 22, 2023, 10:47pm

have you ever considered using vkfft github package, seems pretty mature and well maintained.

vkfft does support Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal, ive noticed quite a few github issues on kokkos and other other lammps packages with HIP

stamoor · December 22, 2023, 10:59pm

It adds a new CMake FFT_KOKKOS option, which would be set to CUFFT. Just see if it gives the expected behavior with cuFFT.

See KSPACE: decouple KOKKOS and non-KOKKOS FFT by hagertnl · Pull Request #4007 · lammps/lammps · GitHub.

stamoor · December 22, 2023, 11:02pm

Re: VkFFT, it could be added in theory but we already have native cuFFT, and also heFFTe support so the code it is already pretty complex; there would be need to be a compelling reason, e.g. it is much faster than existing options. There is also talk of a Kokkos wrapper for FFTs which could simplify everything.

alphataubio · December 22, 2023, 11:20pm

im trying to learn how to merge two github pull requests from two different forks

github.com/lammps/lammps

Corrections for fix cmap to be compatible with CMAP files created by CHARMM-GUI

lammps:develop ← akohlmey:cmap-fixes-for-charmm-gui

opened 10:21PM - 16 Dec 23 UTC

akohlmey

+36 -35

**Summary** LAMMPS data and CMAP files created by LAMMPS-GUI trigger errors i…n fix cmap. This pull request tries to address them while maintaining backward compatibility with the examples included in LAMMPS. **Related Issue(s)** https://matsci.org/t/creating-data-file-using-charmm2lammps/52268/5 **Author(s)** Axel Kohlmeyer, Temple U **Licensing** By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1). **Backward Compatibility** Yes. **Implementation Notes** Some small change to the TextFileReader and PotentialFileReader class is included. The "next_dvector()" method throws an EOF exception when no data has yet been read and EOF is reached. This is a different case from EOF with a partial read, which triggers an error exception. **Post Submission Checklist** - [x] The feature or features in this pull request is complete - [x] Licensing information is complete - [x] Corresponding author information is complete - [x] The source code follows the LAMMPS formatting guidelines - [ ] Suitable new documentation files and/or updates to the existing docs are included - [ ] The added/updated documentation is integrated and tested with the documentation build system - [x] The feature has been verified to work with the conventional build system - [x] The feature has been verified to work with the CMake based build system - [ ] Suitable tests have been added to the unittest tree. - [ ] A package specific README file has been included or updated - [ ] One or more example input decks are included

github.com/lammps/lammps

KSPACE: decouple KOKKOS and non-KOKKOS FFT

lammps:develop ← hagertnl:issue3775_fft_kokkos

opened 05:59PM - 06 Dec 23 UTC

hagertnl

+226 -157

**Summary** This PR decouples the FFT implementation in KOKKOS from non-KOKKO…S styles. The non-KOKKOS FFTs can be defined with `-DFFT_KISSFFT`, as normal, and the KOKKOS FFT can be selected by `-DFFT_KOKKOS_KISSFFT`. Then, unique FFT implementations may be used depending on whether KOKKOS is used or not. All FFT settings also apply to this except FFT_PRECISION which is still unified between vanilla and Kokkos. The KOKKOS FFT settings currently do not inherit from the non-KOKKOS FFT settings. **Related Issue(s)** Fixes #3775 **Author(s)** Nick Hagerty, Oak Ridge National Laboratory, [email protected] **Licensing** By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1). **Backward Compatibility** This PR will change the behavior of the `-DFFT_<x>` compiler flags. `-DFFT_<x>` itself will no longer select the FFT to use for Kokkos pair styles. We may want to consider having the KOKKOS FFT inherit from the standard `-DFFT_<x>` flag if no `-DFFT_KOKKOS_<x>` flag is provided. But allowing the KOKKOS FFT to default to KISSFFT when not specified would be the safest option. **Implementation Notes** Verified correctness using an SPC/e water box of 4 million water molecules on ORNL's Frontier using two runs -- one utilized 8x MI25X with hipFFT, one was CPU-only with the default KISSFFT. Tested using the Make-based build system to specify the exact compiler flags desired. **Post Submission Checklist** - [ ] The feature or features in this pull request is complete - [ ] Licensing information is complete - [x] Corresponding author information is complete - [ ] The source code follows the LAMMPS formatting guidelines - [ ] Suitable new documentation files and/or updates to the existing docs are included - [ ] The added/updated documentation is integrated and tested with the documentation build system - [x] The feature has been verified to work with the conventional build system - [ ] The feature has been verified to work with the CMake based build system - [ ] Suitable tests have been added to the unittest tree. - [ ] A package specific README file has been included or updated - [ ] One or more example input decks are included

into my own so i can build it for my application which absolutely needs the CMAP fixes from @akohlmey otherwise my protein-dna simulation doesnt work.

up to now ive been doing

git clone -b cmap-fixes-for-charmm-gui https://github.com/akohlmey/lammps.git

but it wont work with PR 4007 also at the same time.

sorry im new to github so im googling right now to see how i can merge both those PRs into my own alphataubio fork then clone that to the cluster

maybe you can give me some hints …

stamoor · December 22, 2023, 11:32pm

I add each repository as a “remote”: Git - git-remote Documentation

git remote add hagertnl [email protected]:hagertnl/lammps-fork.git
git fetch hagertnl
git pull hagertnl issue3775_fft_kokkos

alphataubio · December 23, 2023, 7:18pm

im getting "CUDA driver is a stub library " but this might be a local problem with my build environment, LMOD modules are extra flimsy here on the clusters.

details on github PR 4007

stamoor · January 5, 2024, 11:22pm

"CUDA driver is a stub library "

Yes this means your CUDA install is borked in my experience.

akohlmey · January 5, 2024, 11:31pm

Nod. It is surprising how many people in our business don’t know the difference between LIBRARY_PATH and LD_LIBRARY_PATH. … or even know of the former.

alphataubio · January 6, 2024, 12:11am

i know about LD_LIBRARY_PATH, in fact i tried really hard for 2-3 days to get nvhpc module (both 22.x and 23.x versions) from nvidia working because i was so interested in the cuda-aware openmpi bundled with it. the odd thing was it wasn’t seeing the cuda runtime even if all the .so’s were in both LIBRARY_PATH and LD_LIBRARY_PATH.

in the end, i ended up just loading cmake, cuda, gcc modules, then building openmpi-5.0.1 from source configured with cuda on the login node with gpus attached on power9/v100 cluster, and a compute node with gpus on another intel/p100 cluster:

wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.1.tar.gz
tar xzvf openmpi-5.0.1.tar.gz; cd openmpi-5.0.1
module purge; module load MistEnv/2021a cmake/3.21.4 cuda/11.7.1 gcc/11.4.0

./configure --prefix=$HOME/scratch/local/openmpi-5.0.1 --with-cuda=$CUDA_HOME --with-cuda-libdir=$CUDA_HOME/lib64/stubs; make -j 64 all; make install

this is the modulefile i wrote and put in $HOME/local/modules/openmpi/5.0.1

#%Module
set prefix {~/scratch/local/openmpi-5.0.1}
set version {5.0.1}
prepend-path CMAKE_PREFIX_PATH ${prefix}
prepend-path PATH ${prefix}/bin
prepend-path CPATH ${prefix}/include
prepend-path LIBRARY_PATH ${prefix}/lib
prepend-path LD_LIBRARY_PATH ${prefix}/lib
prepend-path MANPATH ${prefix}/share/man
prepend-path PKG_CONFIG_PATH ${prefix}/lib/pkgconfig
setenv MODULE_OPENMPI_PREFIX ${prefix}
prepend-path MODULEPATH ~/scratch/local/openmpi-5.0.1

https://lmod.readthedocs.io/en/latest/015_writing_modules.html

module use $HOME/local/modules
module purge; module load MistEnv/2021a cuda/11.7.1 gcc/11.4.0 openmpi/5.0.1
cmake …
lmp …