Problem with CUDA driver error

Dear lammps users.

I'm trying to include GPU acceleration via GPU package and USER-CUDA package in the last lammps distribution and i'm facing some problems running the benchmark examples. First of all I present a summary of the distribution of lammps that I've been using and the response of the nvc_get_devices from the lammps/lib/gpu directory:

Lammps dist: 9-Oct-2012
OS: Ubuntu 11.04 + Cuda toolkit 4.2 + Nvidia driver 295.41
I compiled lammps and extra libraries without any problem and the respones from nvc_get_devices says:

$ ./nvc_get_devices

Found 1 platform(s).

Using platform: NVIDIA Corporation NVIDIA CUDA Driver

CUDA Driver Version: 4.20

Device 0: "GeForce GTX 680"

   Type of device: GPU

   Compute capability: 3

   Double precision support: Yes

   Total amount of global memory: 1.99969 GB

   Number of compute units/multiprocessors: 8

   Number of cores: 1536

   Total amount of constant memory: 65536 bytes

   Total amount of local/shared memory per block: 49152 bytes

   Total number of registers available per block: 65536

   Warp size: 32

   Maximum number of threads per block: 1024

   Maximum group size (# of threads per block) 1024 x 1024 x 64

   Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535

   Maximum memory pitch: 2147483647 bytes

   Texture alignment: 512 bytes

   Clock rate: 0.7055 GHz

   Run time limit on kernels: Yes

   Integrated: No

   Support host page-locked memory mapping: Yes

   Compute mode: Default

   Concurrent kernel execution: Yes

   Device has ECC support enabled: No

After the compilation, i tried to run the benchmark examples located in /bench/GPU directory. My work deals with eam and meam potentials, so I used the eam example located in these files: in.eam.cpu/in.eam.gpu/in.eam.cuda

I used the command line options to call lammps as shown in README file in that directory:

mpirun -np 12 /home/ekhi/bin/ -sf gpu -c off -v g 1 -v x 32 -v y 32 -v z 64 -v t 100 < in.eam.gpu > out.eam.gpu2

And the case ran without problem. In the attached file "out.eam.gpuCorrect" the log from lammps is exposed.

But when I tried to run the same benchmark case but with more atoms with the following command line (I only change the X and Y variables from 32 to 40):

mpirun -np 12 /home/ekhi/bin/ -sf gpu -c off -v g 1 -v x 40 -v y 40 -v z 64 -v t 100 < in.eam.gpu > out.eam.gpu2

I got this error:

Cuda driver error 702 in call at file 'geryon/nvd_timer.h' in line 76.

out.eam.gpuCorrect (2.27 KB)

out.eam.gpuError (1.19 KB)

Mike can comment. You should be able to run 1M atoms with the GPU package
on a single GPU.



I just checked the example, and for 40 40 64 it is using 639MB on my Kepler card with CUDA5 and all these new goodies for better handling sharing of GPUs. I am also running some beta driver from NVIDIA of the 304.xx line. But heres the caveat:

its 639 MB with 1 MPI process, this increases with more MPI processes:

1 639
4 820
12 1250

So it could just be that you are running out of memory already for 40 40 64, since you are using 12 MPI processes and some moderately older driver (NVIDIA improved the overhead it takes for each process grabing the GPU over time).

With 4 processes and 64 64 64 size it takes about 1.7GB out of my memory. So you should be able to run it up to that size (maybe after updating your drivers. Also: I don't think that you will gain much from going beyond 4 MPI processes per GPU. Typically thats a good value. Sometimes (if you are haevy on bonds, fixes, or if you leave KSPACE calculations to the CPU) more MPI processes make sense, but I don't think that this will be the case for your EAM calculations.


-------- Original-Nachricht --------