CMake fails at finding working MPI when building with Kokkos+CUDA in Docker

Hello everyone,

I’m hitting a blocker when trying to compile LAMMPS with Kokkos’s CUDA backend inside Docker using OpenMPI 4.1.7 installed under /opt/hpcx/ompi and replacing the built-it HPC-X MPI installation because that doesn’t properly support Slurm[1]. The base image is provided by NVIDIA since it’s the quickest starting point I found with CUDA+PyTorch properly installed.

The reason why I opted for Docker is that the image needs to run on a cluster where installing all the environment manually is a bit convoluted, especially PyTorch. Also, I’d like a fully reproducible build, no matter what machine I’m using (after properly adjusting the architecture flags).

Environment setup

  • MPI: OpenMPI 4.1.7 built & installed to /opt/hpcx/ompi

  • Docker ENV:

      ENV CC=/opt/hpcx/ompi/bin/mpicc \
              CXX=/opt/hpcx/ompi/bin/mpicxx \
              MPICC=/opt/hpcx/ompi/bin/mpicc \
              MPICXX=/opt/hpcx/ompi/bin/mpicxx \
              MPI_CXX_COMPILER=/opt/hpcx/ompi/bin/mpicxx \
              MPI_C_COMPILER=/opt/hpcx/ompi/bin/mpicc \
              NVCC_WRAPPER_DEFAULT_COMPILER=/opt/hpcx/ompi/bin/mpicxx
    
  • Kokkos: Installed under /opt/lammps/lib/kokkos

CMake invocation

cmake ../cmake \
  -D CMAKE_INSTALL_PREFIX=/opt/lammps \
  -D CMAKE_PREFIX_PATH="/usr/local/lib/python3.12/dist-packages/torch/share/cmake;/opt/hpcx/ompi" \
  -D CMAKE_C_COMPILER=$CC \
  -D CMAKE_CXX_COMPILER=$CXX \
  -D CMAKE_CUDA_COMPILER=/opt/lammps/lib/kokkos/bin/nvcc_wrapper \
  -D BUILD_MPI=ON \
  -D Kokkos_ENABLE_CUDA=ON \
  -D Kokkos_ENABLE_OPENMP=ON \
  -D Kokkos_ARCH_SKX=ON \
  -D Kokkos_ARCH_PASCAL61=ON \
  # ... other flags

You can have a look at the full Dockerfile here.

Error

-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS)
CMake Error: Could NOT find MPI (missing: MPI_CXX_FOUND)

What I’ve tried

  • -D MPI_CXX_SKIP_MPICXX=TRUE plus manual
    MPI_CXX_ADDITIONAL_INCLUDE_DIRS/MPI_CXX_LIBRARIES overrides
  • Removing all wrapper hacks (OMPI_CXX, MPICH_CXX)
  • Swapping CMAKE_CXX_COMPILER between mpicxx and nvcc_wrapper
  • Explicitly setting CMAKE_CUDA_HOST_COMPILER=$CXX
  • Overriding the wrapper compiler nvcc_wrapper uses via NVCC_WRAPPER_DEFAULT_COMPILER

Nothing has made MPI_CXX_WORKS turn TRUE when nvcc_wrapper is in play.

Important note: If I completely disable CUDA support for Kokkos, then Docker is able to build the image. So I don’t think it’s a problem of OpenMPI or the compilers.

Background

  • Kokkos Manual recommends using nvcc_wrapper as a drop-in C++ compiler and overriding the host compiler via NVCC_WRAPPER_DEFAULT_COMPILER or -ccbin.
  • GitHub issue reports that setting variables like OMPI_CXX breaks the mechanism of nvcc_wrapper, which in turn makes CMake fail when looking for a valid MPI installation.
  • The LAMMPS package that specifically require PyTorch library is ML-MACE. The suggested way to install and use it with LAMMPS is documented here.

What is the exact combination of:

CMAKE_CXX_COMPILER,
CMAKE_CUDA_COMPILER,
CMAKE_CUDA_HOST_COMPILER,
NVCC_WRAPPER_DEFAULT_COMPILER,
MPI_CXX_SKIP_MPICXX (and any MPI_* overrides)

that allows:

  1. CMake’s find_package(MPI) to detect a working mpicxx,
  2. Kokkos’s CUDA backend to compile all .cu and CUDA-templated .cpp files for whatever architecture is set by Kokkos_ARCH_GPUARCH,
  3. Inside a Docker container without manual flag hacking?

Any pointers to a minimal working Dockerfile or CMake flags combination would be greatly appreciated.

Thanks in advance!


  1. The exact error that made me switch was “The application appears to have been direct launched using srun, but OMPI was not built with SLURM’s PMI support and therefore cannot execute.” ↩︎

I suspect that these settings are not correct. They should instead point to the actual compiler and not the mpi wrapper.

CMake uses the MPI compiler wrapper only to extract the settings (i.e. location of mpi.h and the libraries that need to be linked) but then uses the underlying host compiler and not the compiler wrapper.

Thank you Axel for the hints. I changed those two vars as you suggested, but it turned not to be the problem. I had to make the following changes to have an image building correctly.

First, the full set of correct env variables:

ENV CC=/usr/bin/gcc \
    CXX=/usr/bin/g++ \
    MPICC=/opt/hpcx/ompi/bin/mpicc \
    MPICXX=/opt/hpcx/ompi/bin/mpicxx \
    CUDA_HOME=/usr/local/cuda \
    CUDA_LIB_PATH=/usr/local/cuda-12.8/compat/lib.real \
    CMAKE_CUDA_COMPILER=/opt/lammps/lib/kokkos/bin/nvcc_wrapper

The last two were critical.

The second things is a fix that took me quite a while to find, but I still don’t fully understand why it’s needed. I have to apply the following patch to CMakeLists.txt:

--- cmake/CMakeLists.txt.orig	2025-06-16 11:02:36.977050977 +0200
+++ cmake/CMakeLists.txt	2025-06-16 11:02:45.398044421 +0200
@@ -130,9 +130,9 @@
 endif()
 
 # silence nvcc warnings
-if((PKG_KOKKOS) AND (Kokkos_ENABLE_CUDA) AND NOT (CMAKE_CXX_COMPILER_ID STREQUAL "Clang"))
-  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}" "-Xcudafe --diag_suppress=unrecognized_pragma,--diag_suppress=128")
-endif()
+#if((PKG_KOKKOS) AND (Kokkos_ENABLE_CUDA) AND NOT (CMAKE_CXX_COMPILER_ID STREQUAL "Clang"))
+#  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}" "-Xcudafe --diag_suppress=unrecognized_pragma,--diag_suppress=128")
+#endif()
 
 # we *require* C++11 without extensions but prefer C++17.
 # Kokkos requires at least C++17 (currently)

If those flags are added, the Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) comes back. My guess it’s because those flags are not supported by the CXX compiler I’m using which fails the tests by CMake. In fact, I read that the supposedly correct way would be to set CMAKE_CXX_COMPILER to the nvcc_wrapper (which never worked for me). If you have any hint…

I still have to run & test the container, especially with MPI, but it’s already a good progress having built the image.

Thanks again!