Cuda driver error 700 in call at file '/home/sfj/870_QVO/Download/lammps-23Jun2022/lib/gpu/geryon/nvd_timer.h' in line 76

Dear all:
I’m trying to compile GPU acceletated lammps for dpd simulation. I have tried many methods, but the compiled lammps program cannot start the dpd simulation normally.The following is the compilation process, I hope this information can help us solve the problem.

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce 2060  Off  | 00000000:01:00.0 Off |                  N/A |
| 35%   37C    P0     1W / 170W |      0MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
cmake -D PKG_GPU=on -D GPU_API=cuda -D GPU_PREC=mixed -D GPU_ARCH=sm_75 -D CUDPP_OPT=yes -D PKG_DPD-BASIC=on -D PKG_MOLECULE=on -D PKG_EXTRA-FIX=on -D CMAKE_INSTALL_PREFIX=/home/sfj/software/lammps/.local ../cmake

in lammps-23Jun2022/build, print:

-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Running check for auto-generated files from make-based build system
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/libmpichcxx.so (found version "4.0") 
-- Found MPI: TRUE (found version "4.0")  
-- Looking for C++ include omp.h
-- Looking for C++ include omp.h - found
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5") found components: CXX 
-- Found GZIP: /usr/bin/gzip  
-- Found FFMPEG: /usr/bin/ffmpeg  
-- Looking for C++ include cmath
-- Looking for C++ include cmath - found
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda-11.7 (found version "11.7") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Generating style headers...
-- Generating package headers...
-- Generating lmpinstalledpkgs.h...
-- Found ClangFormat: /usr/bin/clang-format (found suitable version "14.0.0", minimum required is "8.0") 
-- The following tools and libraries have been found and configured:
 * Git
 * MPI
 * Threads
 * CUDA
 * OpenMP
 * ClangFormat (required version >= 8.0)

-- <<< Build configuration >>>
   LAMMPS Version:   20220623
   Operating System: Linux Ubuntu 22.04
   CMake Version:    3.22.1
   Build type:       RelWithDebInfo
   Install path:     /home/sfj/software/lammps/.local
   Generator:        Unix Makefiles using /usr/bin/gmake
-- Enabled packages: DPD-BASIC;EXTRA-FIX;GPU;MOLECULE
-- <<< Compilers and Flags: >>>
-- C++ Compiler:     /usr/bin/c++
      Type:          GNU
      Version:       11.4.0
      C++ Flags:     -O2 -g -DNDEBUG
      Defines:       LAMMPS_SMALLBIG;LAMMPS_MEMALIGN=64;LAMMPS_OMP_COMPAT=4;LAMMPS_GZIP;LAMMPS_FFMPEG;LMP_GPU
-- <<< Linker flags: >>>
-- Executable name:  lmp
-- Static library flags:    
-- <<< MPI flags >>>
-- MPI_defines:      MPICH_SKIP_MPICXX;OMPI_SKIP_MPICXX;_MPICC_H
-- MPI includes:     /usr/include/x86_64-linux-gnu/mpich
-- MPI libraries:    /usr/lib/x86_64-linux-gnu/libmpichcxx.so;/usr/lib/x86_64-linux-gnu/libmpich.so;
-- <<< GPU package settings >>>
-- GPU API:                  CUDA
-- CUDA Compiler:            /usr/local/cuda-11.7/bin/nvcc
-- GPU default architecture: sm_75
-- GPU binning with CUDPP:   yes
-- CUDA MPS support:         OFF
-- GPU precision:            MIXED
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sfj/870_QVO/Download/lammps-23Jun2022/build
make -j 16

It can be compiled normally, and two warning messages are returned:

[ 87%] Building CXX object CMakeFiles/lammps.dir/home/sfj/870_QVO/Download/lammps-23Jun2022/src/write_restart.cpp.o
/home/sfj/870_QVO/Download/lammps-23Jun2022/src/variable.cpp: In member function ‘int LAMMPS_NS::Variable::next(int, char**)’:
/home/sfj/870_QVO/Download/lammps-23Jun2022/src/variable.cpp:744:14: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  744 |         fread(buf,1,64,fp);
      |         ~~~~~^~~~~~~~~~~~~

[100%] Linking CXX executable lmp
lto-wrapper: warning: using serial compilation of 128 LTRANS jobs
make install
cd ../examples/PACKAGES/dpd-basic/dpd
mpirun -np 8 lmp -in in.dpd -sf gpu

error:

Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.01
Cuda driver error 700 in call at file '/home/sfj/870_QVO/Download/lammps-23Jun2022/lib/gpu/geryon/nvd_timer.h' in line 76.
Abort(-1) on node 7 (rank 7 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 7

When I use GPU_API=opencl instead of GPU_API=CUDA :

cmake -D PKG_GPU=on -D GPU_API=opencl -D PKG_DPD-BASIC=on -D PKG_MOLECULE=on -D PKG_EXTRA-FIX=on -D CMAKE_INSTALL_PREFIX=/home/sfj/software/lammps/.local ../cmake
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Running check for auto-generated files from make-based build system
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/libmpichcxx.so (found version "4.0") 
-- Found MPI: TRUE (found version "4.0")  
-- Looking for C++ include omp.h
-- Looking for C++ include omp.h - found
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5") found components: CXX 
-- Found GZIP: /usr/bin/gzip  
-- Found FFMPEG: /usr/bin/ffmpeg  
-- Looking for C++ include cmath
-- Looking for C++ include cmath - found
-- Downloading and building OpenCL loader library
-- Downloading https://download.lammps.org/thirdparty/opencl-loader-2022.01.04.tar.gz
-- [download 12% complete]
-- [download 25% complete]
-- [download 37% complete]
-- [download 50% complete]
-- [download 63% complete]
-- [download 75% complete]
-- [download 88% complete]
-- [download 100% complete]
-- Unpacking and configuring opencl-loader-2022.01.04.tar.gz
-- The C compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Looking for secure_getenv
-- Looking for secure_getenv - found
-- Looking for __secure_getenv
-- Looking for __secure_getenv - not found
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Generating style headers...
-- Generating package headers...
-- Generating lmpinstalledpkgs.h...
-- Found ClangFormat: /usr/bin/clang-format (found suitable version "14.0.0", minimum required is "8.0") 
-- The following tools and libraries have been found and configured:
 * Git
 * MPI
 * Threads
 * OpenMP
 * ClangFormat (required version >= 8.0)

-- <<< Build configuration >>>
   LAMMPS Version:   20220623
   Operating System: Linux Ubuntu 22.04
   CMake Version:    3.22.1
   Build type:       RelWithDebInfo
   Install path:     /home/sfj/software/lammps/.local
   Generator:        Unix Makefiles using /usr/bin/gmake
-- Enabled packages: DPD-BASIC;EXTRA-FIX;GPU;MOLECULE
-- <<< Compilers and Flags: >>>
-- C++ Compiler:     /usr/bin/c++
      Type:          GNU
      Version:       11.4.0
      C++ Flags:     -O2 -g -DNDEBUG
      Defines:       LAMMPS_SMALLBIG;LAMMPS_MEMALIGN=64;LAMMPS_OMP_COMPAT=4;LAMMPS_GZIP;LAMMPS_FFMPEG;LMP_GPU
-- C compiler:       /usr/bin/cc
      Type:          
      Version:       
      C Flags:       -O2 -g -DNDEBUG
-- <<< Linker flags: >>>
-- Executable name:  lmp
-- Static library flags:    
-- <<< MPI flags >>>
-- MPI_defines:      MPICH_SKIP_MPICXX;OMPI_SKIP_MPICXX;_MPICC_H
-- MPI includes:     /usr/include/x86_64-linux-gnu/mpich
-- MPI libraries:    /usr/lib/x86_64-linux-gnu/libmpichcxx.so;/usr/lib/x86_64-linux-gnu/libmpich.so;
-- <<< GPU package settings >>>
-- GPU API:                  OPENCL
-- GPU precision:            MIXED
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sfj/870_QVO/Download/lammps-23Jun2022/build
make -j 16

waring:

[ 87%] Building CXX object CMakeFiles/lammps.dir/home/sfj/870_QVO/Download/lammps-23Jun2022/src/DPD-BASIC/pair_dpd.cpp.o
/home/sfj/870_QVO/Download/lammps-23Jun2022/src/variable.cpp: In member function ‘int LAMMPS_NS::Variable::next(int, char**)’:
/home/sfj/870_QVO/Download/lammps-23Jun2022/src/variable.cpp:744:14: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  744 |         fread(buf,1,64,fp);
      |         ~~~~~^~~~~~~~~~~~~
[100%] Linking CXX executable lmp
lto-wrapper: warning: using serial compilation of 128 LTRANS jobs
make install
cd ../examples/PACKAGES/dpd-basic/dpd
mpirun -np 8 lmp -in in.dpd -sf gpu

errors:

Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.01
OpenCL error in file '/home/sfj/870_QVO/Download/lammps-23Jun2022/lib/gpu/geryon/ocl_timer.h' in line 92 : -9999.
Abort(-1) on node 3 (rank 3 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3

I also tried different versions of nvidia driver and cuda, but the same error occurred. Forgive me for not documenting the full details of the attempt. Another information that may be available is: Earlier, I successfully compiled cuda-accelerated gromacs. The gromacs manual is different from the lammps manual. It gives information about the best compilation environment. This may be the reason why I successfully compiled gromacs but failed to compile lammps.

Finally, thanks to everyone who provided technical help on the lammps forum.

First off, please check out the latest development snapshot from here: https://github.com/lammps/lammps/archive/refs/heads/develop.zip

If all goes well, this will become the nest stable release within 24 hours. It has significant upgrades for the GPU package.

Second, try compiling with KOKKOS. The regular dpd pair styles exist also for KOKKOS.

The comparison with Gromacs is not quite comparing apples to apples. In many ways, Gromacs supports far less variety in features and functionality. Thus things can be streamlined and compilation is simpler.

Thanks a lot for your help. Similarly, I used the cmake method to compile the development version, and the error remained. Then I will try KOKKOS and reply with the result later.

Another important check would be to try other input examples to see whether this failure is global or only for some special cases.

I have made some attempts with the latest release of LAMMPS, lammps-stable_2Aug2023, but the GPU-accelerated DPD simulation still cannot be launched properly. I keep encountering Cuda driver errors (error 4, 1, or 700). I tested the four examples under examples/PACKAGES/dpd-basic and my own calculation example, only dpdext can run normally.I successfully compiled kokkos for GPU-accelerated dpd simulation. In addition, I also tested the precompiled lammps on the windows system and found that the performance is better than kokkos on ubuntu. Maybe it is caused by RTX’s poor support for double-precision calculations?

I feel that I cannot dedicate another week to resolve the installation issues with LAMMPS. Using a precompiled version of LAMMPS might be the final solution.Do you have a suggestion for a compilation environment? I want to do one last try.

That is not a fair comparison. The pre-compiled windows binaries use the GPU package in mixed precision (i.e. only accumulation of data in double precision) while KOKKOS (currently) requires double precision for everything.

Compilation of LAMMPS for GPUs is tricky stuff. There are just too many moving parts:

  • the GPU hardware
  • the GPU drivers
  • the toolkit to compile the GPU kernels: CUDA, OpenCL, ROCm/HIP, SYCL, …
  • the option to compile for different precision
  • the need to support many different force kernels, ideally for all of the permutations listed above.

The best that people can do is to tell you what works for them.

FYI, one of the GPU package developers just noticed a bug in pair style dpd/gpu that was introduced in december 2022.

The fix is Collected small changes and fixes by akohlmey · Pull Request #3881 · lammps/lammps · GitHub
and will be available in the next feature release of LAMMPS. But it is quite trivial to edit the affected file yourself and give it a try.

Thank you for your attention to this topic. I made some more attempts and recorded the results here, hoping to help others.

In fact, the pre-compiled lammps can run under the condition of 3060ti in the windows environment. I tried to install the 3060ti on the linux machine.Fortunately, the GPU-accelerated dpd calculation can run successfully on the linux system. :smile:

In addition, I also noticed some information that confuses me. The lammps log has such a message Device 0: NVIDIA GeForce RTX 3060 Ti, 38 CUs, 6.8/7.8 GB, 1.8 GHZ (Mixed Precision). The video memory usage in nvidia-smi is 3896MiB / 8192MiB. And error 700 is an error related to video memory errors. I don’t know if there is any relationship between these messages.