Building LAMMPS on ROCm Docker image for pytorch

hmcezar · January 15, 2024, 3:21pm

I am building a singularity container to run LAMMPS with pair_allegro on LUMI, which has AMD gpus.

Using the ROCm Docker images as a starting point, I successfully built LAMMPS with Kokkos.
CMake, make, and make install all run fine.

However, when I try to run a simple lmp --help I get:

/usr/bin/lmp: error while loading shared libraries: libomp.so: cannot open shared object file: No such file or directory

Do you have any ideas on how to solve this and get a working version of LAMMPS?

Below I paste the build file for the singularity image:

bootstrap: docker
from: rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_1.13.1

%post
    # Install software
    apt-get update
    apt-get install -y file g++ gcc gfortran make gdb strace wget ca-certificates git --no-install-recommends

    # Install mpich
    wget -q http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz
    tar xf mpich-3.1.4.tar.gz
    cd mpich-3.1.4
    ./configure --disable-fortran --enable-fast=all,O3 --prefix=/usr
    make -j$(nproc)
    make install
    ldconfig
    cd ..
    rm -rf mpich-3.1.4.tar.gz 

    # Clone and install NequIP and Allegro
    pip install wandb
    git clone --depth 1 https://github.com/mir-group/nequip.git
    cd nequip
    sed -i 's/"torch>=1.10.0,<1.13,!=1.9.0",/"torch>=1.10.0",/g' setup.py
    pip install .
    cd ..
    git clone --depth 1 https://github.com/mir-group/allegro.git
    cd allegro
    pip install .
    cd ..
    rm -rf nequip allegro

    # Clone pair_allegro and LAMMPS
    export PYTORCH_ROCM_ARCH=gfx90a
    git clone -b stable_2Aug2023_update2 --depth 1 https://github.com/lammps/lammps.git
    git clone -b multicut --depth 1 https://github.com/mir-group/pair_allegro.git
    cd pair_allegro
    ./patch_lammps.sh ../lammps
    cd ..
    export TORCH_CMAKE_PATH=$(python -c 'import torch;print(torch.utils.cmake_prefix_path)')
    cd lammps
    mkdir build
    cd build
    cmake -C ../cmake/presets/basic.cmake -C ../cmake/presets/kokkos-hip.cmake -D PKG_KOKKOS=yes -D Kokkos_ENABLE_OPENMP=yes -D BUILD_OMP=yes -D Kokkos_ARCH_ZEN3=yes -D Kokkos_ARCH_VEGA90A=yes -D Kokkos_ENABLE_HIP=yes -D HIP_PATH=/opt/rocm -D CMAKE_PREFIX_PATH="$TORCH_CMAKE_PATH" -D CMAKE_INSTALL_PREFIX=/usr ../cmake
    make -j$(nproc)
    make install
    cd ../..
    rm -rf pair_allegro lammps

stamoor · January 15, 2024, 3:57pm

Probably need to set LD_LIBRARY_PATH to point to that library. Or disable OpenMP support.

hmcezar · January 15, 2024, 4:06pm

How can I know which libomp LAMMPS linked to?

Also, shouldn’t cmake take care of this and point to the right libraries?

stamoor · January 15, 2024, 4:24pm

This is a runtime issue, not a compile issue, so not related to CMake. Typically when you load a module it takes care of setting these paths, or the system libraries are already installed in a known default path. I remember seeing this issue on OLCF Crusher, which is similar to LUMI. I would recommend disabling OpenMP in LAMMPS since it won’t be used with the Kokkos HIP backend, e.g. -DBUILD_OMP=off .

Or you could report the issue to the system admins on LUMI; this is not a LAMMPS issue but rather a system issue on LUMI (and probably OLCF Frontier). I reported this issue to the OLCF admins but in the end I just disabled OpenMP like I described above.

akohlmey · January 15, 2024, 4:31pm

/usr/bin/lmp: error while loading shared libraries: libomp.so: cannot open shared object file: No such file or directory

Are you running LAMMPS from within the container?
Then the compiler inside the container is not correctly set up.
What is odd, however, is that it looks for libomp.so and not libomp.so.1 (or some other trailing number). So this may be a dependency that is not directly imported from LAMMPS but rather from some other library that has not been built correctly. libgomp.so is usually only used for linking and the library than has an “SONAME” property that points to libgomp.so.1 (or equivalent) which is the runtime library (usually those are the same and the former is a symlink to the latter, but they need not be).

You can do: ldd lmp | grep omp in the build folder (from within the container).

CMake will take care of it for the build and for the build folder also explicit runtime library paths are embedded into the executable, but upon “make install” those are removed (and for good reasons).