Dear users,
I’m running the simulation using LAMMPS (14 May 2016) version. When I ran my simulation using 8 MPI 4 OMP and 1 GPU, its running. If try with 8 MPI 4 OMP 2 GPU its giving the following error
ERROR: Could not find/initialize a specified accelerator device (…/gpu_extra.h:35)
Cuda driver error 4 in call at file ‘geryon/nvd_device.h’ in line 124.
And below here i have given some details of input file
package omp 4
package gpu 2 neigh yes
pair_style lj/class2/coul/long/cuda 9.5
kspace_style pppm/cuda 0.00001
Kindly help me solve this issue.
Dear users,
I'm running the simulation using LAMMPS (14 May 2016) version. When I
ran my simulation using 8 MPI 4 OMP and 1 GPU, its running. If try with 8
MPI 4 OMP 2 GPU its giving the following error
ERROR: Could not find/initialize a specified accelerator device
(../gpu_extra.h:35)
Cuda driver error 4 in call at file 'geryon/nvd_device.h' in line 124.
what is your hardware configuration? what is the output of the
"nvc_get_devices" tool (in lib/gpu)?
do you actually have two (physical) GPUs on each compute node?
And below here i have given some details of input file
this is next to useless.
package omp 4
package gpu 2 neigh yes
pair_style lj/class2/coul/long/cuda 9.5
kspace_style pppm/cuda 0.00001
you cannot use /cuda styles with the gpu package. you have to use /gpu styles.
axel.
Thanks for your prompt reply Dr.Axel.
sorry for the typographical mistakes.
package omp 4
package gpu 2 neigh yes
pair_style lj/class2/coul/long/gpu 9.5
kspace_style pppm/gpu 0.00001
I’m using the /gpu styles only.
Output of ./nvc_get_devices
Found 1 platform(s).
Using platform: NVIDIA Corporation NVIDIA CUDA Driver
CUDA Driver Version: 7.50
Device 0: “Tesla K40c”
Type of device: GPU
Compute capability: 3.5
Double precision support: Yes
Total amount of global memory: 11.2496 GB
Number of compute units/multiprocessors: 15
Number of cores: 2880
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 0.745 GHz
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: Yes
But i have 3 physical GPUs
Here is the ouput of " nvidia-smi"
Can you check the value of the environment variable CUDA_VISIBLE_DEVICES?
-Trung