Error in my GPU lammps compiling : ERROR: GPU library not compiled for this accelerator (../gpu_extra.h:40) ; Cuda driver error 4 in call at file 'geryon/nvd_device.h' in line 124.

Dear LAMMPS-USERS:

I got the error as stated in my topic. I have TITAN X GPU device, that has a compute capability 6.1 and I am with the CUDA driver 8.0. ( I paste the output of this and the GPU Makefile.linux at the botton)

My compiling order is : Installing openmpi ; Installing fftw ; Modifying …/lammps/src/MAKE/Makefile.mpi & Installing lmp_mpi ; Modifying …/lammps/lib/gpu/Makefile.linux & make -f Makefile.linux ; Return to …/lammps/src make yes-all ;
make no-lib ; make no-user ; make yes-gpu ;Modifying …/lammps/src/MAKE/Makefile.mpi ; make mpi . Finally I get lmp_gpu. It’s time to test the gpu lammps. I submit task : …/lammps/src/lmp_gpu -sf gpu < in.melt_imd-gpu ( example in lammps ). Then I get error:GPU library not compiled for this accelerator (…/gpu_extra.h:40) ; Cuda driver error 4 in call at file ‘geryon/nvd_device.h’ in line 124.

As far as I know, in order to solve this problem, I’d need to change the architecture to arch=sm_61 in the /lammps/lib/gpu/Makefile.linux. However after I did this, the problem cann’t gotten solved. The error still exist.

Can anyone help me with this ? Thanks a lot .

DEVICE QUERY RESULTS************
Device 0: “TITAN X (Pascal)”
Type of device: GPU
Compute capability: 6.1
Double precision support: Yes
Total amount of global memory: 11.935 GB
Number of compute units/multiprocessors: 28
Number of cores: 5376
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.531 GHz
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: No

Device 1: “TITAN X (Pascal)”
Type of device: GPU
Compute capability: 6.1
Double precision support: Yes
Total amount of global memory: 11.935 GB
Number of compute units/multiprocessors: 28
Number of cores: 5376
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.531 GHz
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: No

GPU Makefile.linux

/* ----------------------------------------------------------------------

Generic Linux Makefile for CUDA

- Change CUDA_ARCH for your GPU

------------------------------------------------------------------------- */

which file will be copied to Makefile.lammps

EXTRAMAKE = Makefile.lammps.standard

ifeq ($(CUDA_HOME),)
CUDA_HOME = /usr/local/cuda
endif

NVCC = nvcc

Tesla CUDA

CUDA_ARCH = -arch=sm_61

newer CUDA

#CUDA_ARCH = -arch=sm_13

older CUDA

#CUDA_ARCH = -arch=sm_10 -DCUDA_PRE_THREE
#CUDA_ARCH = -arch=sm_35

this setting should match LAMMPS Makefile

one of LAMMPS_SMALLBIG (default), LAMMPS_BIGBIG and LAMMPS_SMALLSMALL

LMP_INC = -DLAMMPS_SMALLBIG

precision for GPU calculations

-D_SINGLE_SINGLE # Single precision for all calculations

-D_DOUBLE_DOUBLE # Double precision for all calculations

-D_SINGLE_DOUBLE # Accumulation of forces, etc. in double

CUDA_PRECISION = -D_SINGLE_SINGLE

CUDA_INCLUDE = -I$(CUDA_HOME)/include
CUDA_LIB = -L$(CUDA_HOME)/lib64
CUDA_OPTS = -DUNIX -O3 -Xptxas -v --use_fast_math $(LMP_INC)

CUDR_CPP = mpic++ -DMPI_GERYON -DUCL_NO_EXIT -DMPICH_IGNORE_CXX_SEEK -DOMPI_SKIP_MPICXX=1 -fPIC
CUDR_OPTS = -O2 $(LMP_INC) # -xHost -no-prec-div -ansi-alias

BIN_DIR = ./
OBJ_DIR = ./
LIB_DIR = ./
AR = ar
BSH = /bin/sh

CUDPP_OPT = -DUSE_CUDPP -Icudpp_mini

include Nvidia.makefile