can't initialize GPU, despite success of ./nvc_get_devices

Hello LAMMPS users
Please help. I am able to install lammps-12Dec2018 with cuda8.0 and sm_20 to run successfully on M2070 GPUs.

However, when I recompile with cuda9.0 and sm_35 to get it to run on a newer K20m GPU, I get this error when I run a lammps job:
ERROR: Unable to initialize accelerator for use (…/gpu_extra.h:45)

Other GPU applications successfully run on this GPU.

In the past, I have been able to fix this error by making sure the CUDA_ARCH used to compile libgpu.a was consistent with the cc output from ./nvc_get_devices. But this time I am really stuck.

My output from ./nvc_get_devices is:

]$ ./nvc_get_devices
Found 1 platform(s).
Using platform: NVIDIA Corporation NVIDIA CUDA Driver
CUDA Driver Version: 9.10

Device 0: “Tesla K20m”
Type of device: GPU
Compute capability: 3.5
Double precision support: Yes
Total amount of global memory: 4.63269 GB
Number of compute units/multiprocessors: 13
Number of cores: 2496
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 0.7055 GHz
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default

Concurrent kernel execution: Yes
Device has ECC support enabled: Yes

The output from nvidia-smi is:

$ nvidia-smi
Wed May 1 16:02:13 2019


please try the latest LAMMPS patch version:
and try compiling with cmake. The CMake configuration script will try to build “fat” binaries for all architectures supported by your CUDA toolkit, if you use CUDA mode (note: default is OpenCL mode).

or, if you have problems with CMake or prefer the conventional build system, try building the GPU library with Makefile.linux_multi

compiling the GPU library in OpenCL mode would be another possible workaround.