Hello everyone,
I’m hitting a blocker when trying to compile LAMMPS with Kokkos’s CUDA backend inside Docker using OpenMPI 4.1.7 installed under /opt/hpcx/ompi
and replacing the built-it HPC-X MPI installation because that doesn’t properly support Slurm[1]. The base image is provided by NVIDIA since it’s the quickest starting point I found with CUDA+PyTorch properly installed.
The reason why I opted for Docker is that the image needs to run on a cluster where installing all the environment manually is a bit convoluted, especially PyTorch. Also, I’d like a fully reproducible build, no matter what machine I’m using (after properly adjusting the architecture flags).
Environment setup
-
MPI: OpenMPI 4.1.7 built & installed to
/opt/hpcx/ompi
-
Docker ENV:
ENV CC=/opt/hpcx/ompi/bin/mpicc \ CXX=/opt/hpcx/ompi/bin/mpicxx \ MPICC=/opt/hpcx/ompi/bin/mpicc \ MPICXX=/opt/hpcx/ompi/bin/mpicxx \ MPI_CXX_COMPILER=/opt/hpcx/ompi/bin/mpicxx \ MPI_C_COMPILER=/opt/hpcx/ompi/bin/mpicc \ NVCC_WRAPPER_DEFAULT_COMPILER=/opt/hpcx/ompi/bin/mpicxx
-
Kokkos: Installed under
/opt/lammps/lib/kokkos
CMake invocation
cmake ../cmake \
-D CMAKE_INSTALL_PREFIX=/opt/lammps \
-D CMAKE_PREFIX_PATH="/usr/local/lib/python3.12/dist-packages/torch/share/cmake;/opt/hpcx/ompi" \
-D CMAKE_C_COMPILER=$CC \
-D CMAKE_CXX_COMPILER=$CXX \
-D CMAKE_CUDA_COMPILER=/opt/lammps/lib/kokkos/bin/nvcc_wrapper \
-D BUILD_MPI=ON \
-D Kokkos_ENABLE_CUDA=ON \
-D Kokkos_ENABLE_OPENMP=ON \
-D Kokkos_ARCH_SKX=ON \
-D Kokkos_ARCH_PASCAL61=ON \
# ... other flags
You can have a look at the full Dockerfile here.
Error
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS)
CMake Error: Could NOT find MPI (missing: MPI_CXX_FOUND)
What I’ve tried
-D MPI_CXX_SKIP_MPICXX=TRUE
plus manual
MPI_CXX_ADDITIONAL_INCLUDE_DIRS
/MPI_CXX_LIBRARIES
overrides- Removing all wrapper hacks (
OMPI_CXX
,MPICH_CXX
) - Swapping
CMAKE_CXX_COMPILER
betweenmpicxx
andnvcc_wrapper
- Explicitly setting
CMAKE_CUDA_HOST_COMPILER=$CXX
- Overriding the wrapper compiler
nvcc_wrapper
uses viaNVCC_WRAPPER_DEFAULT_COMPILER
Nothing has made MPI_CXX_WORKS
turn TRUE when nvcc_wrapper
is in play.
Important note: If I completely disable CUDA support for Kokkos, then Docker is able to build the image. So I don’t think it’s a problem of OpenMPI or the compilers.
Background
- Kokkos Manual recommends using
nvcc_wrapper
as a drop-in C++ compiler and overriding the host compiler viaNVCC_WRAPPER_DEFAULT_COMPILER
or-ccbin
. - GitHub issue reports that setting variables like
OMPI_CXX
breaks the mechanism ofnvcc_wrapper
, which in turn makes CMake fail when looking for a valid MPI installation. - The LAMMPS package that specifically require PyTorch library is
ML-MACE
. The suggested way to install and use it with LAMMPS is documented here.
What is the exact combination of:
CMAKE_CXX_COMPILER,
CMAKE_CUDA_COMPILER,
CMAKE_CUDA_HOST_COMPILER,
NVCC_WRAPPER_DEFAULT_COMPILER,
MPI_CXX_SKIP_MPICXX (and any MPI_* overrides)
that allows:
- CMake’s
find_package(MPI)
to detect a workingmpicxx
, - Kokkos’s CUDA backend to compile all
.cu
and CUDA-templated.cpp
files for whatever architecture is set byKokkos_ARCH_GPUARCH
, - Inside a Docker container without manual flag hacking?
Any pointers to a minimal working Dockerfile or CMake flags combination would be greatly appreciated.
Thanks in advance!
The exact error that made me switch was “The application appears to have been direct launched using
srun
, but OMPI was not built with SLURM’s PMI support and therefore cannot execute.” ↩︎