Issue with GPU package

Dear all,

I successfully compiled the latest version of LAMMPS with GPU package using OpenCL, but I get an error after running my script on my workstation with 40 cores and having the latest version of ubuntu:

package gpu 0 split 1.0 tpa 2 omp 1
suffix gpu
.
.
.
.

  • Using acceleration for eam/alloy:
  • with 40 proc(s) per device.
  • with OpenCL Parameters for: NVIDIA_GPU (203)
  • Horizontal vector operations: ENABLED
  • Shared memory system: No

Device 0: Quadro K620, 3 CUs, 2 GB, 1.1 GHZ (Mixed Precision)

Initializing Device and compiling on process 0…Done.
Initializing Device 0 on core 0…Done.
Initializing Device 0 on core 1…Done.
Initializing Device 0 on core 2…Done.
Initializing Device 0 on core 3…Done.
Initializing Device 0 on core 4…Done.
Initializing Device 0 on core 5…Done.
Initializing Device 0 on core 6…Done.
Initializing Device 0 on core 7…Done.
Initializing Device 0 on core 8…Done.
Initializing Device 0 on core 9…Done.
Initializing Device 0 on core 10…Done.
Initializing Device 0 on core 11…Done.
Initializing Device 0 on core 12…Done.
Initializing Device 0 on core 13…Done.
Initializing Device 0 on core 14…Done.
Initializing Device 0 on core 15…Done.
Initializing Device 0 on core 16…Done.
Initializing Device 0 on core 17…Done.
Initializing Device 0 on core 18…Done.
Initializing Device 0 on core 19…Done.
Initializing Device 0 on core 20…Done.
Initializing Device 0 on core 21…Done.
Initializing Device 0 on core 22…Done.
Initializing Device 0 on core 23…Done.
Initializing Device 0 on core 24…Done.
Initializing Device 0 on core 25…Done.
Initializing Device 0 on core 26…Done.
Initializing Device 0 on core 27…Done.
Initializing Device 0 on core 28…Done.
Initializing Device 0 on core 29…Done.
Initializing Device 0 on core 30…Done.
Initializing Device 0 on core 31…Done.
Initializing Device 0 on core 32…Done.
Initializing Device 0 on core 33…Done.
Initializing Device 0 on core 34…Done.
Initializing Device 0 on core 35…Done.
Initializing Device 0 on core 36…Done.
Initializing Device 0 on core 37…Done.
Initializing Device 0 on core 38…Done.
Initializing Device 0 on core 39…Done.

Neighbor list info …
update every 1 steps, delay 0 steps, check yes
max neighbors/atom: 10000, page size: 100000
master list distance cutoff = 8.506786
ghost atom cutoff = 8.506786
binsize = 5.506786, bins = 44 19 55
0 neighbor lists, perpetual/occasional/extra = 0 0 0
Setting up Verlet run …
Unit style : metal
Current step : 0
Time step : 0.001
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.

MPI_ABORT was invoked on rank 38 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
[cauchy:1193482] PMIX ERROR: UNREACHABLE in file …/…/…/src/server/pmix_server.c at line 2193
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
OpenCL error in file ‘/home/cauchy/Desktop/LAMMPS/lammps-4May2022/lib/gpu/geryon/ocl_kernel.h’ in line 467 : -4.
[cauchy:1193482] 31 more processes have sent help message help-mpi-api.txt / mpi-abort
[cauchy:1193482] Set MCA parameter “orte_base_help_aggregate” to 0 to see all help / error messages

This is a very poor GPU. Very old, very few CUDA cores, low clock, little RAM, and narrow memory bus. Chances are, that running on this kind of GPU will be barely faster than just one CPU core. Oversubscribing such a GPU 40-fold is pure folly. Do you even have 40 CPU cores or 20 with activated hyper-threading?

The only reasonable action here is to ignore the GPU or get a (much) better one. But even for a top-of-the-line GPU oversubscribing is likely only effective up to 4-6 fold.

To give you a point of reference, I just logged into a very old workstation (from 2009) with an Nvidia GTX Titan GPU (Kepler 3.5 architecture, not quite as old). That GPU has 14 CUs (almost 5x of yours), 6GB RAM and a clock of 0.9GHz. With this (high end at the time) GPU the in.eam benchmark in the bench folder using just one MPI rank runs in 0.3 seconds. using 2 MPI ranks takes 0.5 seconds so is not useful to oversubscribed the GPU for that input. A single CPU core completes this input in 6.5 seconds. all 8 CPUs (2 socket quad core) complete it in just under 1 second. So using just one MPI rank and the (better than yours but similar age) GPU is only about 3-4 times faster than all 8 CPU cores. With having fewer CUDA cores and more CPU cores it is obvious that trying to expect any acceleration from that kind of GPU is without ground.

Thanks a lot Axel for the detailed response!

I have 40 CPU cores. Following your advice, I will use OMP package with number of thread = 1 per MPI task to accelerate the runs then, unless you have another advice on using another packages. I am playing mostly with EAM, MEAM, ML-HDNNP, ML-PACE potentials in my simulations.

Given the age of your GPU and thus the probable age of your workstation and CPU, and considering how unreliable the information was that you have provided in previous exchanges, this is unlikely. I will only believe it when I see it.

The LAMMPS manual has detailed discussions on how to optimize performance and use accelerated packages effectively.

Out of these, only EAM has OpenMP support. Other accelerator support is listed in the corresponding documentation when available.