Just a quick feedback on how I resolved the problem so that it helps other people looking at this thread.
I went back and matched the openmpi versions both inside the container and the host to openmpi-4.0.6. However, this still did not resolve the problem.
Then, quite by chance, as I was looking at this page:
I noticed that I had a -C
flag with the Apptainer command in the mpirun command to launch the job
mpirun -n 2 apptainer exec -C --nv -B $PWD:/host_pwd sboxGpuLammps-cu-11-0 /lammps/build/./lmp -k on g 1 -sf kk -pk kokkos -in /host_pwd/in.lj.txt >> log.lammps
This was one of the key culprits and once I removed this unnecessary flag I got the expected behavior. It is better to use the following command to launch the job:
mpirun -n 2 apptainer run --nv -B $PWD:/host_pwd –pwd /host_pwd mpi4Lammps.sif /lammps/build/lmp -k on g 1 -sf kk -pk kokkos -in /host_pwd/in.lj.txt
The command option in bold is very useful/required to direct the output to a directory of your choice.
Other useful notes regarding the definition files that come with Lammps in …/lammps/tools/singularity/ubuntu18.04_*.def:
The libvtk6-dev, libnetcdf and libpnetcdf libraries when installed from the ubuntu-18.04 repo also install mpi-default-bin and mpi-default-dev that are openmpi-2.1.1 despite removing the explicit install commands for the mpi-stuff. So, I opted to exclude the VTK and the NETCDF packages from the install. To install these packages as well, one would likely need to install the libvtk6-dev, libnetcdf and libpnetcdf libraries from source rather than the ubuntu-18.04 repo after the installation of the mpi-package of choice. Of course, all of this applies only if the mpi-version on host is different from the default openmpi-2.1.1 installed from the Ubuntu-18.04 repo.
Also, the newer releases of LAMMPS require cmake-3.16 or higher for the installation of the Kokkos package while the definition files pull cmake-3.10 from the ubuntu-repos. To get around this one needs to install a higher >3.16 cmake version from the tar-file as in:
###########################################################################
# CMAKE
###########################################################################
cd /
wget https://cmake.org/files/v3.23/cmake-3.23.2.tar.gz
tar -xvf cmake-3.23.2.tar.gz
cd cmake-3.23.2
./bootstrap && make -j 16 && make install
apt-get update
###########################################################################
However, if one removes the installation of the libvtk6-dev, libnetcdf and libpnetcdf from the *.def file then one needs to install the open-ssl and libssl-dev libraries additionally to complete the installation for CMake-3.23.2.
The public key (7fa2af80.pub) for the cuda-repos in the definition files is old and needs to be replaced with a newer one (3bf863cc.pub):
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
add-apt-repository “deb Index of /compute/cuda/repos/ubuntu1804/x86_64 /”
apt-get update
The same applies to the Radeon Open Compute repo as well. Here it appears that the online location of the gpg-key has changed. The updated one that I used is:
curl -sL http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
printf “deb [arch=amd64] Index of /rocm/apt/4.3/ xenial main” > /etc/apt/sources.list.d/rocm.list