Multiple GPUs issue

Dear users,

I’m running the simulation using LAMMPS (14 May 2016) version. When I ran my simulation using 8 MPI 4 OMP and 1 GPU, its running. If try with 8 MPI 4 OMP 2 GPU its giving the following error

ERROR: Could not find/initialize a specified accelerator device (…/gpu_extra.h:35)
Cuda driver error 4 in call at file ‘geryon/nvd_device.h’ in line 124.

And below here i have given some details of input file

package omp 4
package gpu 2 neigh yes

pair_style lj/class2/coul/long/cuda 9.5
kspace_style pppm/cuda 0.00001

Kindly help me solve this issue.

Dear users,

    I'm running the simulation using LAMMPS (14 May 2016) version. When I
ran my simulation using 8 MPI 4 OMP and 1 GPU, its running. If try with 8
MPI 4 OMP 2 GPU its giving the following error

ERROR: Could not find/initialize a specified accelerator device
(../gpu_extra.h:35)
Cuda driver error 4 in call at file 'geryon/nvd_device.h' in line 124.

what is your hardware configuration? what is the output of the
"nvc_get_devices" tool (in lib/gpu)?
do you actually have two (physical) GPUs on each compute node?

And below here i have given some details of input file

this is next to useless.

package omp 4
package gpu 2 neigh yes

pair_style lj/class2/coul/long/cuda 9.5
kspace_style pppm/cuda 0.00001

you cannot use /cuda styles with the gpu package. you have to use /gpu styles.

axel.

Thanks for your prompt reply Dr.Axel.

sorry for the typographical mistakes.

package omp 4
package gpu 2 neigh yes

pair_style lj/class2/coul/long/gpu 9.5
kspace_style pppm/gpu 0.00001

I’m using the /gpu styles only.

Output of ./nvc_get_devices

Found 1 platform(s).
Using platform: NVIDIA Corporation NVIDIA CUDA Driver
CUDA Driver Version: 7.50

Device 0: “Tesla K40c”
Type of device: GPU
Compute capability: 3.5
Double precision support: Yes
Total amount of global memory: 11.2496 GB
Number of compute units/multiprocessors: 15
Number of cores: 2880
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 0.745 GHz
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: Yes

But i have 3 physical GPUs

Here is the ouput of " nvidia-smi"

Can you check the value of the environment variable CUDA_VISIBLE_DEVICES?

-Trung