There is no error reported while building, but when I run it, the code will fail
The error “Cuda driver error 100 in call at file ‘/data/sourcecode/lammps/lammps-stable_29Sep2021/lib/gpu/geryon/nvd_device.h’ in line 323” was shown.
I tried the latest version and the previous one, the code always failed at " CU_SAFE_CALL_NS(cuInit(0))"
The compiling environment: CentOS 7, CUDA_11.1.0_455.23.05, devtoolset-9, and intel-2018
Please provide the output of nvc_get_devices. And the output of nvidia-smi.
What happens, if you do not set GPU_ARCH?
Have you tried compiling for -D GPU_API=opencl? Do you get the same kind of error?
Yes, I set GPU_ARCH, I tried sm_75, sm_80 and sm_86 for A40, but all failed.
I have not tried -D GPU_API=opencl. I haven’t installed OPENCL library. CUDA’s performance should be better for a Nvidia GPU card,
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
nvc_get_devices
Found 1 platform(s).
CUDA Driver Version: 11.20
Device 0: “A40”
Type of device: GPU
Compute capability: 8.6
Double precision support: Yes
Total amount of global memory: 44.5645 GB
Number of compute units/multiprocessors: 84
Number of cores: 16128
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.74 GHz
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: Yes
LAMMPS has its own OpenCL loader, so not additional software is needed. The CUDA driver (not toolkit) comes with the required OpenCL runtime and ICD configuration file.
With the updates to the GPU package included in the 29Sep2021 stable release, the performance of OpenCL should be much better than in previous versions of the GPU package and in my tests it was comparable with the CUDA version. Besides, a CUDA version that crashes would not be faster than an OpenCL version that doesn’t.
I just tried -D GPU_API=opencl. The work still failed.
I
n the output:
LAMMPS (29 Sep 2021)
using 1 OpenMP thread(s) per MPI task
ERROR: Invalid OpenCL platform ID. (src/GPU/gpu_extra.h:77)
Last command: package gpu 0
It is a just test job:
3d Lennard-Jones melt
variable x index 1
variable y index 1
variable z index 1
The ERROR mesage:
terminate called after throwing an instance of ‘std::runtime_error’
what(): cudaGetDeviceCount(&m_cudaDevCount) error( cudaErrorNoDevice): no CUDA-capable device is detected /data/sourcecode/lammps/lammps-20Sep2021/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:224
Traceback functionality not available
terminate called after throwing an instance of ‘std::runtime_error’
what(): cudaGetDeviceCount(&m_cudaDevCount) error( cudaErrorNoDevice): no CUDA-capable device is detected /data/sourcecode/lammps/lammps-20Sep2021/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:224
Traceback functionality not available
It seems the Nvidia A40 GPU card cannot be recognized by LAMMPS.
I also tried VASP of gpu version and one code of our group based on pytorch. It is fine to run these codes.
LAMMPS is completely agnostic to the details of how to access the hardware.
This is all delegated to the CUDA toolkit and the corresponding driver. The fact that it always fails when opening the device is a strong hint in that direction. This suggests that there is something inconsistent with your machine setup or that you are not using a CUDA toolkit version compatible with your specific GPU.
To follow this up in a more consistent way, I suggest you provide suitable summary of this discussion (please also include the output of nvcc --version and gcc --version) and then submit this as a “Bug report” issue at Issues · lammps/lammps · GitHub so we can involve people that are maintaining the relevant code and also experts from Nvidia that know LAMMPS.