ERROR: GPU library not compiled for this accelerator

So, I get this error when compiling the gpu package. I have a GTX 760 device, that has a compute capability 3.0 and I am with the CUDA driver 5.5. (I paste the output of this and the GPU Makefile at the bottom)

As far as I know, in order to get the GPU package working, I’d need to change the architecture to arch=sm_30 in the lib/gpu/Makefile.linux. However, although I do this, I keep getting the error that gives subject to this e-mail: GPU library not compiled for this accelerator.

I think I might have got something wrong, since I thought that the capability was the sm_XX part, but I’m not that sure now. Just to check what was going wrong, I tried installing and running the user-cuda package, and I had no problem running the examples given there, with both 3.0 and 2.1 architecture.

Be aware, anyway, that I am not absolutely sure of these things I check; I tried digging a bit on the CUDA manual, but everything was kinda encrypted to my eyes (I admit I’m quite new to CUDA, and didn’t make much more than a couple of “Hello world” things)

Can anyone help me with this?

************************ DEVICE QUERY RESULTS ******************************
CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GTX 760”
CUDA Driver Version / Runtime Version 5.5 / 5.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 6) Multiprocessors, (192) CUDA Cores/MP: 1152 CUDA Cores
GPU Clock rate: 1176 MHz (1.18 GHz)
Memory Clock rate: 3104 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GeForce GTX 760
Result = PASS

Try using

-arch=sm_30

instead of

-arch sm_30

Also, make sure that you do a

make -f Makefile.linux clean

before you build in lib/gpu. See if you still have a problem.

Best, - Mike

ok, I finally found the problem, it wasn’t anything lammps-related, so sorry for the noise.

for those who want to know what this was about: it looked like I was having some permission
issues with previous builds, and I couldn’t delete all the files as a regular user (specifically, I
couldn’t remove the .h files), and therefore the “make clean” wasn’t working as expected.

thanks mike for the input!

Pablo