GPU Package Compilation

Hi everybody,

I am trying to compile the GPU package of LAMMPS. After compiling the GPU library and installing it, I should compile lammps, but I face the following error. I appreciate it if you can help me.
CMake Error at Modules/StyleHeaderUtils.cmake:173 (message):

########################################################################

Found package(s) installed by the make-based build system

Please run

make -C
/mmfs1/home/user_name/LAMMPS_GPU/try/031022/lammps-20Sep2021/src
no-all purge

to uninstall

########################################################################
Call Stack (most recent call first):
CMakeLists.txt:528 (DetectBuildSystemConflict)

– Configuring incomplete, errors occurred!
See also “/mmfs1/home/user_name/LAMMPS_GPU/try/031022/lammps-20Sep2021/build/CMakeFiles/CMakeOutput.log”.

You are mixing two different methods of compiling LAMMPS and that cannot work.

If you use CMake to build LAMMPS, you must not build the GPU library and must not run make yes-gpu in the src folder. Instead you enable the GPU package by adding -DPKG_GPU=yes and possible other changes (to switch from the default OpenCL to CUDA and from the default of mixed precision to single or double precision as required) also with additional -D settings on the command line (or interactively via cmake-gui or ccmake).

So you either need to unpack/clone the LAMMPS sources again and discard your current tree or run the commands suggested by the error message. After that you can configure with CMake according to the CMake(!) build instructions in the LAMMPS manual.

Dear Axel,

Thank you very much for your explanation. I did what you suggested and this time I face the following error, can you please guide me again? Thank you.

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_CUDA_LIBRARY (ADVANCED)
linked by target “gpu” in directory /mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-20Sep2021/cmake
linked by target “nvc_get_devices” in directory /mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-20Sep2021/cmake

CMake Warning at CMakeLists.txt:142 (add_executable):
Cannot generate a safe runtime search path for target lmp because files in
some directories may conflict with libraries in implicit directories:

runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
  /mmfs1/apps/intel/psxe/2020/intelpython3/lib
runtime library [libzstd.so.1] in /usr/lib64 may be hidden by files in:
  /mmfs1/apps/intel/psxe/2020/intelpython3/lib

Some of these libraries may not be found correctly.

– Generating done
CMake Generate step failed. Build files cannot be regenerated correctly.

Can you please provide the complete CMake output?

Hello Axel,

Thank you very much for your time and help. Here is the whole CMake output.
loading initial cache file …/cmake/presets/most.cmake
– The CXX compiler identification is Intel 19.1.3.20200925
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiicc - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Git: /usr/bin/git (found version “2.27.0”)
– Appending /mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/cuda-11.1.0-ovkhraqgea52ujcgzb7d5mwjlco5vo2o/lib64:/mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/fftw-3.3.8-3ylozdyqb3ff2t44i6me6esvdrpaua7x/lib:/mmfs1/apps/intel/psxe/2020/clck/2019.10/lib/intel64:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/ipp/lib/intel64:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/…/tbb/lib/intel64_lin/gcc4.4:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/…/tbb/lib/intel64_lin/gcc4.8:/mmfs1/apps/intel/psxe/2020/lib to CMAKE_LIBRARY_PATH: /mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/cuda-11.1.0-ovkhraqgea52ujcgzb7d5mwjlco5vo2o/lib64:/mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/fftw-3.3.8-3ylozdyqb3ff2t44i6me6esvdrpaua7x/lib:/mmfs1/apps/intel/psxe/2020/clck/2019.10/lib/intel64:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/ipp/lib/intel64:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/…/tbb/lib/intel64_lin/gcc4.4:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/…/tbb/lib/intel64_lin/gcc4.8:/mmfs1/apps/intel/psxe/2020/lib
– Running check for auto-generated files from make-based build system
– Found MPI_CXX: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiicc (found version “3.1”)
– Found MPI: TRUE (found version “3.1”)
– Looking for C++ include omp.h
– Looking for C++ include omp.h - found
– Found OpenMP_CXX: -qopenmp (found version “5.0”)
– Found OpenMP: TRUE (found version “5.0”)
– Found GZIP: /usr/bin/gzip
– Could NOT find FFMPEG (missing: FFMPEG_EXECUTABLE)
– Found PkgConfig: /usr/bin/pkg-config (found version “1.4.2”)
– Checking for module ‘fftw3’
– Found fftw3, version 3.3.8
– Found FFTW3: /mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/fftw-3.3.8-3ylozdyqb3ff2t44i6me6esvdrpaua7x/lib/libfftw3.so
– Found Python: /mmfs1/apps/intel/psxe/2020/intelpython3/include/python3.7m (found version “3.7.7”) found components: Development Development.Module Development.Embed
– Found Cythonize: /mmfs1/apps/intel/psxe/2020/intelpython3/bin/cythonize
– Could NOT find VORO (missing: VORO_LIBRARY VORO_INCLUDE_DIR)
– Voro++ download requested - we will build our own
– Could NOT find Eigen3 (missing: Eigen3_DIR)
– Eigen3 download requested - we will build our own
– Found ZLIB: /usr/lib64/libz.so (found version “1.2.11”)
– Checking for module ‘libzstd>=1.4’
– Found libzstd, version 1.4.2
– Looking for C++ include cmath
– Looking for C++ include cmath - found
– Looking for C++ include pthread.h
– Looking for C++ include pthread.h - found
– Performing Test CMAKE_HAVE_LIBC_PTHREAD
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
– Found Threads: TRUE
– Found CUDA: /mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/cuda-11.1.0-ovkhraqgea52ujcgzb7d5mwjlco5vo2o (found version “11.1”)
CMake Warning at CMakeLists.txt:597 (message):
Plugin loading will not work unless BUILD_SHARED_LIBS is enabled

– Generating style headers…
– Generating package headers…
– Generating lmpinstalledpkgs.h…
– The Fortran compiler identification is Intel 19.1.3.20200925
– Detecting Fortran compiler ABI info
– Detecting Fortran compiler ABI info - done
– Check for working Fortran compiler: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort - skipped
– Checking whether /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort supports Fortran 90
– Checking whether /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort supports Fortran 90 - yes
– The C compiler identification is Intel 19.1.3.20200925
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /mmfs1/apps/intel/psxe/2020/bin/icc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Found Python: /mmfs1/apps/intel/psxe/2020/intelpython3/bin/python3.7 (found version “3.7.7”) found components: Interpreter
– Could NOT find ClangFormat (missing: ClangFormat_EXECUTABLE) (Required is at least version “8.0”)
– The following tools and libraries have been found and configured:

  • Git
  • MPI
  • OpenMP
  • FFTW3
  • Cythonize
  • ZLIB
  • PkgConfig
  • Threads
  • CUDA
  • Python

– <<< Build configuration >>>
Operating System: Linux Red Hat Enterprise Linux 8.2
Build type: RelWithDebInfo
Install path: /mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-20Sep2021
Generator: Unix Makefiles using /usr/bin/gmake
– Enabled packages: ASPHERE;BOCS;BODY;BROWNIAN;CG-DNA;CG-SDK;CLASS2;COLLOID;COLVARS;COMPRESS;CORESHELL;DIELECTRIC;DIFFRACTION;DIPOLE;DPD-BASIC;DPD-MESO;DPD-REACT;DPD-SMOOTH;DRUDE;EFF;EXTRA-COMPUTE;EXTRA-DUMP;EXTRA-FIX;EXTRA-MOLECULE;EXTRA-PAIR;FEP;GPU;GRANULAR;INTERLAYER;KSPACE;MACHDYN;MANYBODY;MC;MEAM;MISC;ML-IAP;ML-SNAP;MOFFF;MOLECULE;OPENMP;OPT;ORIENT;PERI;PHONON;PLUGIN;POEMS;PYTHON;QEQ;REACTION;REAXFF;REPLICA;RIGID;SHOCK;SPH;SPIN;SRD;TALLY;UEF;VORONOI;YAFF
– <<< Compilers and Flags: >>>
– C++ Compiler: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiicc
Type: Intel
Version: 19.1.3.20200925
C++ Flags: -restrict -O2 -g -DNDEBUG
Defines: LAMMPS_SMALLBIG;LAMMPS_MEMALIGN=64;LAMMPS_OMP_COMPAT=4;LAMMPS_GZIP;FFT_FFTW3;LMP_PYTHON;MLIAP_PYTHON;LAMMPS_ZSTD;LMP_OPENMP;LMP_GPU
Options: -xHost
– Fortran Compiler: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort
Type: Intel
Version: 19.1.3.20200925
Fortran Flags: -O2 -g
– C compiler: /mmfs1/apps/intel/psxe/2020/bin/icc
Type: Intel
Version: 19.1.3.20200925
C Flags: -O2 -g -DNDEBUG
– <<< Linker flags: >>>
– Executable name: lmp_mpi
– Static library flags:
– <<< MPI flags >>>
– MPI_defines: MPICH_SKIP_MPICXX;OMPI_SKIP_MPICXX;_MPICC_H
– MPI includes:
– MPI libraries: ;
– <<< GPU package settings >>>
– GPU API: CUDA
– CUDA Compiler: /mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/cuda-11.1.0-ovkhraqgea52ujcgzb7d5mwjlco5vo2o/bin/nvcc
– GPU default architecture: sm_80
– GPU binning with CUDPP: OFF
– CUDA MPS support: OFF
– GPU precision: DOUBLE
– <<< FFT settings >>>
– Primary FFT lib: FFTW3
– Using double precision FFTs
– Using non-threaded FFTs
– <<< Building Tools >>>
– Configuring done
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_CUDA_LIBRARY (ADVANCED)
linked by target “gpu” in directory /mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-20Sep2021/cmake
linked by target “nvc_get_devices” in directory /mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-20Sep2021/cmake

CMake Warning at CMakeLists.txt:142 (add_executable):
Cannot generate a safe runtime search path for target lmp because files in
some directories may conflict with libraries in implicit directories:

runtime library [libz.so.1] in /usr/lib64 may be hidden by files in:
  /mmfs1/apps/intel/psxe/2020/intelpython3/lib
runtime library [libzstd.so.1] in /usr/lib64 may be hidden by files in:
  /mmfs1/apps/intel/psxe/2020/intelpython3/lib

Some of these libraries may not be found correctly.

– Generating done
CMake Warning:
Manually-specified variables were not used by the project:

PKG_USER-OMP

CMake Generate step failed. Build files cannot be regenerated correctly.

This is a known problem. You are trying to compile LAMMPS on a machine that has no CUDA “driver” installed. Most likely you are compiling on the login node of an HPC cluster and only (some of) the compute nodes have GPUs and the corresponding driver installed. For modern CUDA toolkit versions, this should not be a problem, since they ship a so-called “stub” version for the driver library, but the CUDA support in CMake does not find it and hence you get the error.

We have found a workaround with the following change to the CMake scripting in LAMMPS:

 diff --git a/cmake/Modules/Packages/GPU.cmake b/cmake/Modules/Packages/GPU.cmake
  index fe15917f47..aec8887c30 100644
  --- a/cmake/Modules/Packages/GPU.cmake
  +++ b/cmake/Modules/Packages/GPU.cmake
  @@ -30,7 +30,15 @@ file(GLOB GPU_LIB_SOURCES ${LAMMPS_LIB_SOURCE_DIR}/gpu/[^.]*.cpp)
   file(MAKE_DIRECTORY ${LAMMPS_LIB_BINARY_DIR}/gpu)
   
   if(GPU_API STREQUAL "CUDA")
  -  find_package(CUDA REQUIRED)
  +  find_package(CUDA QUIET)
  +  # augment search path for CUDA toolkit libraries to include the stub versions. Needed to find libcuda.so on machines without a CUDA driver installation
  +  if(CUDA_FOUND)
  +    set(CMAKE_LIBRARY_PATH "${CUDA_TOOLKIT_ROOT_DIR}/lib64/stubs;${CUDA_TOOLKIT_ROOT_DIR}/lib/stubs;${CUDA_TOOLKIT_ROOT_DIR}/lib64;${CUDA_TOOLKIT_ROOT_DIR}/lib;${CMAKE_LIBRARY_PATH}")
  +    find_package(CUDA REQUIRED)
  +  else()
  +    message(FATAL_ERROR "CUDA Toolkit not found")
  +  endif()
  +
     find_program(BIN2C bin2c)
     if(NOT BIN2C)
       message(FATAL_ERROR "Could not find bin2c, use -DBIN2C=/path/to/bin2c to help cmake finding it.")

I am not certain whether this will cleanly work with the specific LAMMPS version that you are using, though. Probably the best approach would be to wait until release a new version of LAMMPS later this week which will have the workaround included and then use that version.

Hello Axel,
Thank you very much for your suggestions. I tried both options, but I faced the following error. I appreciate it if you can share with me a new solution.
loading initial cache file …/cmake/presets/most.cmake
– The CXX compiler identification is Intel 19.1.3.20200925
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiicc - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Git: /usr/bin/git (found version “2.27.0”)
– Appending /mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/cuda-11.1.0-ovkhraqgea52ujcgzb7d5mwjlco5vo2o/lib64:/mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/fftw-3.3.8-3ylozdyqb3ff2t44i6me6esvdrpaua7x/lib:/mmfs1/apps/intel/psxe/2020/clck/2019.10/lib/intel64:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/ipp/lib/intel64:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/…/tbb/lib/intel64_lin/gcc4.4:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/…/tbb/lib/intel64_lin/gcc4.8:/mmfs1/apps/intel/psxe/2020/lib to CMAKE_LIBRARY_PATH: /mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/cuda-11.1.0-ovkhraqgea52ujcgzb7d5mwjlco5vo2o/lib64:/mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/fftw-3.3.8-3ylozdyqb3ff2t44i6me6esvdrpaua7x/lib:/mmfs1/apps/intel/psxe/2020/clck/2019.10/lib/intel64:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/ipp/lib/intel64:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/lib/intel64_lin:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/…/tbb/lib/intel64_lin/gcc4.4:/mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/daal/…/tbb/lib/intel64_lin/gcc4.8:/mmfs1/apps/intel/psxe/2020/lib
– Running check for auto-generated files from make-based build system
– Found MPI_CXX: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiicc (found version “3.1”)
– Found MPI: TRUE (found version “3.1”)
– Looking for C++ include omp.h
– Looking for C++ include omp.h - found
– Found OpenMP_CXX: -qopenmp (found version “5.0”)
– Found OpenMP: TRUE (found version “5.0”)
– Found GZIP: /usr/bin/gzip
– Could NOT find FFMPEG (missing: FFMPEG_EXECUTABLE)
– Found PkgConfig: /usr/bin/pkg-config (found version “1.4.2”)
– Checking for module ‘fftw3’
– Found fftw3, version 3.3.8
– Found FFTW3: /mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/fftw-3.3.8-3ylozdyqb3ff2t44i6me6esvdrpaua7x/lib/libfftw3.so
– Found Python: /mmfs1/apps/intel/psxe/2020/intelpython3/include/python3.7m (found version “3.7.7”) found components: Development Development.Module Development.Embed
– Found Cythonize: /mmfs1/apps/intel/psxe/2020/intelpython3/bin/cythonize
– Could NOT find VORO (missing: VORO_LIBRARY VORO_INCLUDE_DIR)
– Voro++ download requested - we will build our own
– Could NOT find Eigen3 (missing: Eigen3_DIR)
– Eigen3 download requested - we will build our own
– Found ZLIB: /usr/lib64/libz.so (found version “1.2.11”)
– Checking for module ‘libzstd>=1.4’
– Found libzstd, version 1.4.2
– Looking for C++ include cmath
– Looking for C++ include cmath - found
CMake Error in Modules/Packages/GPU.cmake:
A logical block opening on the line

/mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-20Sep2021/cmake/Modules/Packages/GPU.cmake:31 (if)

is not closed.
Call Stack (most recent call first):
CMakeLists.txt:589 (include)

CMake Warning at CMakeLists.txt:597 (message):
Plugin loading will not work unless BUILD_SHARED_LIBS is enabled

– Generating style headers…
– Generating package headers…
– Generating lmpinstalledpkgs.h…
– The Fortran compiler identification is Intel 19.1.3.20200925
– Detecting Fortran compiler ABI info
– Detecting Fortran compiler ABI info - done
– Check for working Fortran compiler: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort - skipped
– Checking whether /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort supports Fortran 90
– Checking whether /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort supports Fortran 90 - yes
– The C compiler identification is Intel 19.1.3.20200925
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /mmfs1/apps/intel/psxe/2020/bin/icc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Found Python: /mmfs1/apps/intel/psxe/2020/intelpython3/bin/python3.7 (found version “3.7.7”) found components: Interpreter
– Could NOT find ClangFormat (missing: ClangFormat_EXECUTABLE) (Required is at least version “8.0”)
– The following tools and libraries have been found and configured:

  • Git
  • MPI
  • OpenMP
  • FFTW3
  • Cythonize
  • ZLIB
  • PkgConfig
  • Python

– <<< Build configuration >>>
Operating System: Linux Red Hat Enterprise Linux 8.2
Build type: RelWithDebInfo
Install path: /mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-20Sep2021
Generator: Unix Makefiles using /usr/bin/gmake
– Enabled packages: ASPHERE;BOCS;BODY;BROWNIAN;CG-DNA;CG-SDK;CLASS2;COLLOID;COLVARS;COMPRESS;CORESHELL;DIELECTRIC;DIFFRACTION;DIPOLE;DPD-BASIC;DPD-MESO;DPD-REACT;DPD-SMOOTH;DRUDE;EFF;EXTRA-COMPUTE;EXTRA-DUMP;EXTRA-FIX;EXTRA-MOLECULE;EXTRA-PAIR;FEP;GPU;GRANULAR;INTERLAYER;KSPACE;MACHDYN;MANYBODY;MC;MEAM;MISC;ML-IAP;ML-SNAP;MOFFF;MOLECULE;OPENMP;OPT;ORIENT;PERI;PHONON;PLUGIN;POEMS;PYTHON;QEQ;REACTION;REAXFF;REPLICA;RIGID;SHOCK;SPH;SPIN;SRD;TALLY;UEF;VORONOI;YAFF
– <<< Compilers and Flags: >>>
– C++ Compiler: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiicc
Type: Intel
Version: 19.1.3.20200925
C++ Flags: -restrict -O2 -g -DNDEBUG
Defines: LAMMPS_SMALLBIG;LAMMPS_MEMALIGN=64;LAMMPS_OMP_COMPAT=4;LAMMPS_GZIP;FFT_FFTW3;LMP_PYTHON;MLIAP_PYTHON;LAMMPS_ZSTD;LMP_OPENMP;LMP_GPU
Options: -xHost
– Fortran Compiler: /mmfs1/apps/intel/psxe/2020/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort
Type: Intel
Version: 19.1.3.20200925
Fortran Flags: -O2 -g
– C compiler: /mmfs1/apps/intel/psxe/2020/bin/icc
Type: Intel
Version: 19.1.3.20200925
C Flags: -O2 -g -DNDEBUG
– <<< Linker flags: >>>
– Executable name: lmp_mpi
– Static library flags:
– <<< MPI flags >>>
– MPI_defines: MPICH_SKIP_MPICXX;OMPI_SKIP_MPICXX;_MPICC_H
– MPI includes:
– MPI libraries: ;
– <<< GPU package settings >>>
– GPU API: CUDA
– CUDA Compiler:
– GPU default architecture: sm_80
– GPU binning with CUDPP:
– CUDA MPS support:
– GPU precision: DOUBLE
– <<< FFT settings >>>
– Primary FFT lib: FFTW3
– Using double precision FFTs
– Using non-threaded FFTs
– <<< Building Tools >>>
– Configuring incomplete, errors occurred!
See also “/mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-20Sep2021/build/CMakeFiles/CMakeOutput.log”.

You cannot have tried both options since we released a new version just now.
And the first attempt clearly indicates that you didn’t apply the change from the patch correctly.
Your folder names suggest that you are using version 20 September 2021, so neither the latest stable nor the latest patch release.

Dear Axel,
By both I meant trying the GPU node and changing the CMake file. Thank you for letting me know about the new version. I downloaded the new version and I successfully installed lammps and added the GPU package. Can you please share with me an example that uses the GPU package? I greatly appreciate it if you can do help me one more time.

There are detailed explanations in the LAMMPS manual about how to use the GPU package and there are plenty of examples in the LAMMPS source code distribution (not all of them allow GPU acceleration, but also that can be found out from the manual).

Dear Axel,
I am trying to run one of the GPU examples, but, I face the following error. I am using the most updated version of LAMMPS. Thank you.
ERROR: Unrecognized fix style ‘gpu’ (src/modify.cpp:878)
Last command: fix gpu all gpu force 0 0 1.0

Please provide the entire input file, the command line you used, and the complete output, not just the error message.

Hi Axel,
Thank you for your response. Attached are inpute file, output file and the script for running lammps:

# stick a buckyball into a nanotube
units           real
dimension       3
boundary       f f f
atom_style      molecular
newton          off
package gpu 1
processors * * 1

# read topology 
read_data       data.bucky-plus-cnt

pair_style  lj/cut/gpu  10.0
bond_style  harmonic
angle_style charmm
dihedral_style charmm

special_bonds lj/coul 0.0 0.0 0.0

pair_coeff  1  1    0.07    3.55
pair_coeff  1  2    0.07    3.55
pair_coeff  2  2    0.07    3.55
bond_coeff     1  305.0     1.4
angle_coeff    1   40.000 120.00   35.00   2.41620
dihedral_coeff 1    3.100   2     180   0.0

neighbor        2.0 bin
neigh_modify    delay 0 every 1 check yes

timestep        2.0

# required for GPU acceleration
fix   gpu  all      gpu  force 0 0 1.0

# we only move some atoms.
group mobile type 1

# have balls bounce off the walls
fix     walls       mobile wall/reflect xlo EDGE ylo EDGE zlo EDGE xhi EDGE yhi EDGE zhi EDGE

velocity mobile create 303.0 46659 mom yes rot yes dist gaussian

# take some potential energy out of the system
minimize 1.0e-4 1.0e-6 100 1000

reset_timestep 0

fix     integrate   mobile nve
fix     thermostat  mobile langevin 300.0 300.0 2000.0 234624

# IMD setup.
fix  comm       all imd 6789 unwrap on trate 10
#fix  comm       all imd 6789 unwrap on trate 10 nowait on

# temperature is based on mobile atoms only
compute mobtemp mobile temp
thermo_style    custom step pe ke evdwl emol c_mobtemp spcpu
thermo          1000
thermo_modify   norm yes
thermo_modify   temp mobtemp

run             100000000
#############
LAMMPS (24 Mar 2022)
  using 1 OpenMP thread(s) per MPI task
# stick a buckyball into a nanotube
units           real
dimension       3
boundary       f f f
atom_style      molecular
newton          off
package gpu 1
processors * * 1

# read topology
read_data       data.bucky-plus-cnt
Reading data file ...
  orthogonal box = (-35 -30 -6) to (45 30 6)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  1000 atoms
  scanning bonds ...
  3 = max bonds/atom
  scanning angles ...
  9 = max angles/atom
  scanning dihedrals ...
  24 = max dihedrals/atom
  reading bonds ...
  1485 bonds
  reading angles ...
  2940 angles
  reading dihedrals ...
  5830 dihedrals
Finding 1-2 1-3 1-4 neighbors ...
  special bond factors lj:    0        0        0       
  special bond factors coul:  0        0        0       
     3 = max # of 1-2 neighbors
     6 = max # of 1-3 neighbors
    18 = max # of 1-4 neighbors
    18 = max # of special neighbors
  special bonds CPU = 0.009 seconds
  read_data CPU = 0.025 seconds

pair_style  lj/cut/gpu  10.0
bond_style  harmonic
angle_style charmm
dihedral_style charmm

special_bonds lj/coul 0.0 0.0 0.0

pair_coeff  1  1    0.07    3.55
pair_coeff  1  2    0.07    3.55
pair_coeff  2  2    0.07    3.55
bond_coeff     1  305.0     1.4
angle_coeff    1   40.000 120.00   35.00   2.41620
dihedral_coeff 1    3.100   2     180   0.0

neighbor        2.0 bin
neigh_modify    delay 0 every 1 check yes

timestep        2.0

# required for GPU acceleration
fix   gpu  all      gpu  force 0 0 1.0
ERROR: Unrecognized fix style 'gpu' (src/modify.cpp:878)
Last command: fix   gpu  all      gpu  force 0 0 1.0
############
#!/bin/bash
#PBS -q gpus
#PBS -N lammp
#PBS -l select=1:mem=4gb:ncpus=1:ngpus=1
#PBS -l walltime=1:00:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your sponsor's project group]"
#PBS -W group_list=x-ccast-prj-wxia

## fftw/3.3.7-gcc may be needed for certain pakages, e.g, KSPACE
module load fftw/3.3.8-gcc-3ylo
## load Intel Parallel Studio 2020
module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
## load LAMMPS version 27May2021
module load lammps/27May2021-intel

cd $PBS_O_WORKDIR

export NUM_PROC=`cat $PBS_NODEFILE | wc -l`

## change the input filename as needed
INPUT=in.bucky-plus-cnt-gpu
mylammps=/mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/03312022/lammps-24Mar2022/build
mpirun -np $NUM_PROC $mylammps/lmp_mpi < $INPUT > output.txt

exit 0

This is an input file from the stone ages. I.e. from about 2011-2012. Where did you find it?
I recognize it well. The syntax of LAMMPS has changed in a few places. for example there is no “fix gpu” command anymore.

I suggest rather than running old inputs that are likely broken and need updates, you follow the documentation and adapt a current input (like those in the bench or examples folders) according to the documentation to be used with the GPU package. in most cases, they don’t need to be edited at all.

Dear Axel,
Thanks for your message. I took this example from the latest version of lammps in the following directory!
/mmfs1/home/sarah.ghazanfari/LAMMPS_GPU/try/031022/lammps-24Mar2022/examples/PACKAGES/imd
Which example are you suggesting me to run? ( from which directory of the latest version of the lammps)

I would start with the “bench” folder. in.lj first as that is the simplest.

It runs successfully, but, I am looking for a GPU example. I want to test GPU. I installed the GPU package and I don’t know how to use it! Can you please guide me? Thank you very much

This means that you have not read the documentation or at least not well enough. It is crucial that you have a good understanding of how GPU acceleration works, what its requirements are, how input files and command lines need or need not to be modified and so on. Otherwise you are very likely to just waste time and resources.

The in.lj input runs perfectly fine on the GPU. Please compare the following which I ran just now. The first with 2 CPUs and the second with 2 GPUs and 1 GPU. With the identical in.lj input file!!
And if you want to have some fun use the -var flag to run a 4x4x4 replicated system (more info in the README file and also in the manual).

LAMMPS (24 Mar 2022)
  using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
Created orthogonal box = (0 0 0) to (33.591924 33.591924 33.591924)
  1 by 1 by 2 MPI processor grid
Created 32000 atoms
  using lattice units in orthogonal box = (0 0 0) to (33.591924 33.591924 33.591924)
  create_atoms CPU = 0.001 seconds
  generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update every 20 steps, delay 0 steps, check no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 2.8
  ghost atom cutoff = 2.8
  binsize = 1.4, bins = 24 24 24
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair lj/cut, perpetual
      attributes: half, newton on
      pair build: half/bin/atomonly/newton
      stencil: half/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 7.885 | 7.885 | 7.885 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   1.44          -6.7733681      0             -4.6134356     -5.0197073    
       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105   
Loop time of 0.699057 on 2 procs for 100 steps with 32000 atoms

Performance: 61797.576 tau/day, 143.050 timesteps/s
99.4% CPU use with 2 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.57277    | 0.57707    | 0.58136    |   0.6 | 82.55
Neigh   | 0.090927   | 0.091356   | 0.091785   |   0.1 | 13.07
Comm    | 0.012837   | 0.017531   | 0.022225   |   3.5 |  2.51
Output  | 6.1438e-05 | 6.5781e-05 | 7.0124e-05 |   0.0 |  0.01
Modify  | 0.011152   | 0.011202   | 0.011252   |   0.0 |  1.60
Other   |            | 0.001836   |            |       |  0.26

Nlocal:          16000 ave       16001 max       15999 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:        13632.5 ave       13635 max       13630 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:         601416 ave      605200 max      597633 min
Histogram: 1 0 0 0 0 0 0 0 0 1

Total # of neighbors = 1202833
Ave neighs/atom = 37.588531
Neighbor list builds = 5
Dangerous builds not checked
Total wall time: 0:00:00
LAMMPS (24 Mar 2022)
  using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
Created orthogonal box = (0 0 0) to (33.591924 33.591924 33.591924)
  1 by 1 by 2 MPI processor grid
Created 32000 atoms
  using lattice units in orthogonal box = (0 0 0) to (33.591924 33.591924 33.591924)
  create_atoms CPU = 0.004 seconds

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- GPU package (short-range, long-range and three-body potentials):
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE


--------------------------------------------------------------------------
- Using acceleration for lj/cut:
-  with 2 proc(s) per device.
-  with OpenCL Parameters for: NVIDIA_GPU (203)
-  Horizontal vector operations: ENABLED
-  Shared memory system: No
--------------------------------------------------------------------------
Device 0: NVIDIA GeForce GTX 1060 6GB, 10 CUs, 5.9 GB, 1.7 GHZ (Double Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Device 0 on core 0...Done.
Initializing Device 0 on core 1...Done.

  generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 5.797 | 5.797 | 5.797 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   1.44          -6.7733681      0             -4.6134356     -5.0197073    
       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105   
Loop time of 0.18253 on 2 procs for 100 steps with 32000 atoms

Performance: 236673.451 tau/day, 547.855 timesteps/s
99.1% CPU use with 2 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.1561     | 0.15816    | 0.16022    |   0.5 | 86.65
Neigh   | 1.359e-06  | 1.4795e-06 | 1.6e-06    |   0.0 |  0.00
Comm    | 0.014604   | 0.016503   | 0.018403   |   1.5 |  9.04
Output  | 6.5723e-05 | 0.00011551 | 0.00016531 |   0.0 |  0.06
Modify  | 0.0058443  | 0.0059254  | 0.0060065  |   0.1 |  3.25
Other   |            | 0.001826   |            |       |  1.00

Nlocal:          16000 ave       16001 max       15999 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:        13632.5 ave       13635 max       13630 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 2 0 0 0 0 0 0 0 0 0

Total # of neighbors = -2
Ave neighs/atom = -6.25e-05
Neighbor list builds = 5
Dangerous builds not checked


---------------------------------------------------------------------
      Device Time Info (average): 
---------------------------------------------------------------------
Data Transfer:   0.0723 s.
Neighbor copy:   0.0064 s.
Neighbor build:  0.0233 s.
Force calc:      0.0482 s.
Device Overhead: 0.0309 s.
Average split:   1.0000.
Lanes / atom:    4.
Vector width:    32.
Max Mem / Proc:  14.17 MB.
CPU Neighbor:    0.0020 s.
CPU Cast/Pack:   0.0131 s.
CPU Driver_Time: 0.0053 s.
CPU Idle_Time:   0.1442 s.
---------------------------------------------------------------------

Total wall time: 0:00:00

Dear Axel,
Thank you very much for your help. Here is my output, one question: why mine doesn’t have type of GPU like yours? ( with OpenCL Parameters for: NVIDIA_GPU (203))

LAMMPS (24 Mar 2022)
  using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
Created orthogonal box = (0 0 0) to (33.591924 33.591924 33.591924)
  1 by 1 by 2 MPI processor grid
Created 32000 atoms
  using lattice units in orthogonal box = (0 0 0) to (33.591924 33.591924 33.591924)
  create_atoms CPU = 0.002 seconds

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- GPU package (short-range, long-range and three-body potentials):
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE


--------------------------------------------------------------------------
- Using acceleration for lj/cut:
-  with 2 proc(s) per device.
-  Horizontal vector operations: ENABLED
-  Shared memory system: No
--------------------------------------------------------------------------
Device 0: NVIDIA A100-PCIE-40GB, 108 CUs, 39/40 GB, 1.4 GHZ (Double Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Device 0 on core 0...Done.
Initializing Device 0 on core 1...Done.

  generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 5.808 | 5.808 | 5.808 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   1.44          -6.7733681      0             -4.6134356     -5.0197073    
       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105   
Loop time of 0.103154 on 2 procs for 100 steps with 32000 atoms

Performance: 418792.749 tau/day, 969.428 timesteps/s
98.8% CPU use with 2 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.046712   | 0.061137   | 0.075561   |   5.8 | 59.27
Neigh   | 8.9e-07    | 1.0415e-06 | 1.193e-06  |   0.0 |  0.00
Comm    | 0.014138   | 0.028436   | 0.042734   |   8.5 | 27.57
Output  | 9.2873e-05 | 0.00021463 | 0.00033638 |   0.0 |  0.21
Modify  | 0.009617   | 0.0096439  | 0.0096707  |   0.0 |  9.35
Other   |            | 0.003721   |            |       |  3.61

Nlocal:          16000 ave       16001 max       15999 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:        13632.5 ave       13635 max       13630 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 2 0 0 0 0 0 0 0 0 0

Total # of neighbors = -2
Ave neighs/atom = -6.25e-05
Neighbor list builds = 5
Dangerous builds not checked


---------------------------------------------------------------------
      Device Time Info (average): 
---------------------------------------------------------------------
Data Transfer:   0.0121 s.
Neighbor copy:   0.0003 s.
Neighbor build:  0.0035 s.
Force calc:      0.0374 s.
Device Overhead: 0.0631 s.
Average split:   1.0000.
Lanes / atom:    4.
Vector width:    32.
Max Mem / Proc:  14.17 MB.
CPU Neighbor:    0.0019 s.
CPU Cast/Pack:   0.0126 s.
CPU Driver_Time: 0.0018 s.
CPU Idle_Time:   0.0464 s.
---------------------------------------------------------------------

Total wall time: 0:00:00

Because you didn’t compile for OpenCL but for CUDA. With CUDA you must have an Nvidia GPU.
Due to using OpenCL my executable can also run with Intel or AMD GPUs (but is a tiny bit slower on Nvidia GPUS). Again, this is all explained at great length in the manual.

I will close this discussion now. You have finally been able to use the GPU and my goodwill to help you is exhausted.