Error after building lib/gpu

I ran into this error after building lib/gpu. What could be the issue and how can I fix.

zac3553@i-Zac:~/lammps-23Jun2022/examples/mc$ mpiexec -np 6 lmp_mpi -in in.pure -sf gpu
LAMMPS (23 Jun 2022 - Update 3)
Reading data file …
orthogonal box = (-8.21157 -8.21157 -8.21157) to (8.21157 8.21157 8.21157)
1 by 2 by 3 MPI processor grid
reading atoms …
4000 atoms
scanning bonds …
1 = max bonds/atom
scanning angles …
1 = max angles/atom
reading bonds …
3900 bonds
reading angles …
3800 angles
Finding 1-2 1-3 1-4 neighbors …
special bond factors lj: 0 1 1
special bond factors coul: 0 0 0
2 = max # of 1-2 neighbors
2 = max # of 1-3 neighbors
4 = max # of 1-4 neighbors
6 = max # of special neighbors
special bonds CPU = 0.004 seconds
read_data CPU = 0.057 seconds
dynamic group g1 defined
dynamic group g2 defined

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:

  • GPU package (short-range, long-range and three-body potentials):
    The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE


  • Using acceleration for lj/cut:
  • with 6 proc(s) per device.
  • Horizontal vector operations: ENABLED
  • Shared memory system: No

Device 0: NVIDIA GeForce RTX 2070 with Max-Q Design, 36 CUs, 7/8 GB, 1.1 GHZ (Mixed Precision)

Initializing Device and compiling on process 0…Cuda driver error 500 in call at file ‘geryon/nvd_texture.h’ in line 112.
Abort(-1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
zac3553@i-Zac:~/lammps-23Jun2022/examples/mc$

zac3553@i-Zac:~/lammps-23Jun2022/lib/gpu$ ./nvc_get_devices
Found 1 platform(s).
CUDA Driver Version: 12.10

Device 0: “NVIDIA GeForce RTX 2070 with Max-Q Design”
Type of device: GPU
Compute capability: 7.5
Double precision support: Yes
Total amount of global memory: 7.99969 GB
Number of compute units/multiprocessors: 36
Number of cores: 6912
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.125 GHz
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: No

zac3553@i-Zac:~/lammps-23Jun2022/examples/mc$ nvidia-smi
Sun Mar 12 04:58:41 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 531.14 CUDA Version: 12.1 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2070 w… On | 00000000:01:00.0 Off | N/A |
| N/A 51C P8 6W / N/A| 326MiB / 8192MiB | 3% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 22 G /Xwayland N/A |
±--------------------------------------------------------------------------------------+

It is difficult to tell what is the cause of this. The 23 June 2022 version (regardless of the update revision) predates CUDA 12 and thus has not been vetted with CUDA 12.1 that you seem to be using. We only recently updated our test machines and fixed a bunch of portability issues in the GPU package (mostly for OpenCL compilation, though, not CUDA) in LAMMPS version 8 Feb 2023. … and that was with CUDA 12.0, not 12.1.

When you are using significantly newer versions of CUDA than what was in common use (i.e. not necessarily the latest version, but what was commonly available in repackaged format for Linux distributions) at the time a LAMMPS version was released, you are running a risk that the Nvidia engineers made some internal changes to the CUDA toolkit of the OpenCL runtime, that are not (yet) anticipated in LAMMPS.

That said, another possible cause is the way how the GPU package and library was configured and compiled. It could have been misconfigured or enable incorrect settings. This is difficult to tell from remote. The instructions in the manual and the README file are quite detailed, though.