Hello everyone,
I’m hitting a blocker when trying to compile LAMMPS with Kokkos’s CUDA backend inside Docker using OpenMPI 4.1.7 installed under /opt/hpcx/ompi and replacing the built-it HPC-X MPI installation because that doesn’t properly support Slurm[1]. The base image is provided by NVIDIA since it’s the quickest starting point I found with CUDA+PyTorch properly installed.
The reason why I opted for Docker is that the image needs to run on a cluster where installing all the environment manually is a bit convoluted, especially PyTorch. Also, I’d like a fully reproducible build, no matter what machine I’m using (after properly adjusting the architecture flags).
Environment setup
-
MPI: OpenMPI 4.1.7 built & installed to
/opt/hpcx/ompi -
Docker ENV:
ENV CC=/opt/hpcx/ompi/bin/mpicc \ CXX=/opt/hpcx/ompi/bin/mpicxx \ MPICC=/opt/hpcx/ompi/bin/mpicc \ MPICXX=/opt/hpcx/ompi/bin/mpicxx \ MPI_CXX_COMPILER=/opt/hpcx/ompi/bin/mpicxx \ MPI_C_COMPILER=/opt/hpcx/ompi/bin/mpicc \ NVCC_WRAPPER_DEFAULT_COMPILER=/opt/hpcx/ompi/bin/mpicxx -
Kokkos: Installed under
/opt/lammps/lib/kokkos
CMake invocation
cmake ../cmake \
-D CMAKE_INSTALL_PREFIX=/opt/lammps \
-D CMAKE_PREFIX_PATH="/usr/local/lib/python3.12/dist-packages/torch/share/cmake;/opt/hpcx/ompi" \
-D CMAKE_C_COMPILER=$CC \
-D CMAKE_CXX_COMPILER=$CXX \
-D CMAKE_CUDA_COMPILER=/opt/lammps/lib/kokkos/bin/nvcc_wrapper \
-D BUILD_MPI=ON \
-D Kokkos_ENABLE_CUDA=ON \
-D Kokkos_ENABLE_OPENMP=ON \
-D Kokkos_ARCH_SKX=ON \
-D Kokkos_ARCH_PASCAL61=ON \
# ... other flags
You can have a look at the full Dockerfile here.
Error
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS)
CMake Error: Could NOT find MPI (missing: MPI_CXX_FOUND)
What I’ve tried
-D MPI_CXX_SKIP_MPICXX=TRUEplus manual
MPI_CXX_ADDITIONAL_INCLUDE_DIRS/MPI_CXX_LIBRARIESoverrides- Removing all wrapper hacks (
OMPI_CXX,MPICH_CXX) - Swapping
CMAKE_CXX_COMPILERbetweenmpicxxandnvcc_wrapper - Explicitly setting
CMAKE_CUDA_HOST_COMPILER=$CXX - Overriding the wrapper compiler
nvcc_wrapperuses viaNVCC_WRAPPER_DEFAULT_COMPILER
Nothing has made MPI_CXX_WORKS turn TRUE when nvcc_wrapper is in play.
Important note: If I completely disable CUDA support for Kokkos, then Docker is able to build the image. So I don’t think it’s a problem of OpenMPI or the compilers.
Background
- Kokkos Manual recommends using
nvcc_wrapperas a drop-in C++ compiler and overriding the host compiler viaNVCC_WRAPPER_DEFAULT_COMPILERor-ccbin. - GitHub issue reports that setting variables like
OMPI_CXXbreaks the mechanism ofnvcc_wrapper, which in turn makes CMake fail when looking for a valid MPI installation. - The LAMMPS package that specifically require PyTorch library is
ML-MACE. The suggested way to install and use it with LAMMPS is documented here.
What is the exact combination of:
CMAKE_CXX_COMPILER,
CMAKE_CUDA_COMPILER,
CMAKE_CUDA_HOST_COMPILER,
NVCC_WRAPPER_DEFAULT_COMPILER,
MPI_CXX_SKIP_MPICXX (and any MPI_* overrides)
that allows:
- CMake’s
find_package(MPI)to detect a workingmpicxx, - Kokkos’s CUDA backend to compile all
.cuand CUDA-templated.cppfiles for whatever architecture is set byKokkos_ARCH_GPUARCH, - Inside a Docker container without manual flag hacking?
Any pointers to a minimal working Dockerfile or CMake flags combination would be greatly appreciated.
Thanks in advance!
The exact error that made me switch was “The application appears to have been direct launched using
srun, but OMPI was not built with SLURM’s PMI support and therefore cannot execute.” ↩︎