Latest GPU support

Dear lammps developers!

I found Nvidia last GPUs are not compatible with pre-compilied Windows versions of LAMMPS.
I assume, the reason is the compilation option GPU_ARCH.
I found Kepler GPU works fine, but it is not possimle to use Turing GPU (For which GPU_ARCH=75 is required). Is it possible to adjust the compilation settings for the future releases of Windows binaries to be able to carry out calculations on more efficient modern Turing GPUs?

Best regards, Dmitry

Dear lammps developers!

I found Nvidia last GPUs are not compatible with pre-compilied Windows versions of LAMMPS.
I assume, the reason is the compilation option GPU_ARCH.
I found Kepler GPU works fine, but it is not possimle to use Turing GPU (For which GPU_ARCH=75 is required). Is it possible to adjust the compilation settings for the future releases of Windows binaries to be able to carry out calculations on more efficient modern Turing GPUs?

the pre-compiled windows binaries do not use CUDA but the OpenCL variant of the GPU package, so GPU_ARCH settings are irrelevant. the pre-compiled windows binaries are built with a cross-compiler, but there is no CUDA cross-compiler and LAMMPS currently cannot be built with the native windows (visual c++) compilers.
please provide the error messages that LAMMPS produces and the output of running ocl_get_devices

thanks,
axel.

Many thanks for fast reply!
Let me describe two cases:

  1. GTX 760 (Kepler)
    Lammps works fine with the command like this
    C:\Users\XEON2650V2\Desktop\GPU test>lmp_serial.exe -sf gpu -i test.inp

  2. GTX 1660 Ti (Turing)

C:\Users\XEON2650V2\Desktop\GPU test>lmp_serial.exe -sf gpu -pk gpu 1 -i test.inp

LAMMPS (19 Sep 2019)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (…/comm.cpp:93)
using 1 OpenMP thread(s) per MPI task
ERROR: Could not find/initialize a specified accelerator device (…/gpu_extra.h:35)
Last command: package gpu 1

The strange thing occur if I send output in file instead of command line: the error message is another:

C:\Users\XEON2650V2\Desktop\GPU test>lmp_serial.exe -sf gpu -pk gpu 1 -i test.inp > test.out

test.out contains:

LAMMPS (19 Sep 2019)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (…/comm.cpp:93)
using 1 OpenMP thread(s) per MPI task
ERROR: Invalid OpenCL platform ID. (…/gpu_extra.h:62)
Last command: package gpu 1

Unfortunately, ocl_get_devices runs with error: libstdc+±6.dll and libwinpthread-1.dll not found in both cases.With this background, Nvidia OpenCL benchmarks works fine in both cases.
Latest Nvidia drivers are installed in both cases.

Regards,

Dmitry

thanks for the feedback and sorry about the oversight with linking the ocl_get_devices binary. i’m making adjustments to the build settings for the windows binaries and am running a rebuild of the last patch release and the last stable release with those updates. i rarely use windows myself (the binaries are built with a cross-compiler on linux and i only test them on a windows box if there are confirmed bugs that cannot be reproduced and debugged on linux), so i don’t notice such issues easily.

the output of ocl_get_devices would have given me confirmation, what the error messages hint at: you most likely have multiple OpenCL installable client drivers from multiple vendors on your system, e.g. for Intel CPUs, and the driver for the Nvidia CPU is not the first platform/device so LAMMPS won’t pick it up by default and the platform it picks instead is not compatible with the requirements for the OpenCL kernels in LAMMPS.

you should be able to resolve this issue by using the “device” keyword in addition to -pk gpu 1. without the ocl_get_devices output, you have to figure this out through trial and error. for a discussion of the option, please see the documentation of the package keyword at https://lammps.sandia.gov/doc/package.html

a more drastic approach would be to remove unwanted vendors from the corresponding registry key “HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors” with the registry editor.

i will post an update here, if the corrected windows packages are posted.

axel.

Unfortunately, both removing registry key and playing with device keyword do not lead to any result
If you do not delete the registry keys, then ocl_get_devices from lammps-64bit-20160309 writes following output:

C:\Program Files\LAMMPS 64-bit 20160309\bin>ocl_get_devices
Found 1 platform(s).
Using platform: Intel® Corporation Intel® CPU Runtime for OpenCL™ Applications OpenCL 2.1 WINDOWS

Device 0: " Intel® Xeon® CPU E5-2650 v2 @ 2.60GHz"
Type of device: CPU
Double precision support: Yes
Total amount of global memory: 15.9315 GB
Number of compute units/multiprocessors: 16
Total amount of constant memory: 131072 bytes
Total amount of local/shared memory per block: 32768 bytes
Maximum group size (# of threads per block) 8192
Maximum item sizes (# threads for each dim) 8192 x 8192 x 8192
Clock rate: 2.6 GHz
ECC support: No
Device fission into equal partitions: Yes
Device fission by counts: Yes
Device fission by affinity: No
Maximum subdevices from fission: 16

C:\Program Files\LAMMPS 64-bit 20160309\bin>

So, there is no GPU at all. I don’t know whether the 2016-year version not allow to recognize modern GPUs or the LAMMPS / OCL_GET_DEVICES didn’t really see GPU.

GPU driver version is 431.60.

Regards,
Dmitry

updated windows installer packages for the latest patch and stable release are posted now. please uninstall your current and download and install the new one and try running ocl_get_devices again. if it works, you can confirm whether my hunch about the multiple ICDs is correct.

thanks,
axel.

this being windows, and driver support being a rather low level feature, you may have to reboot to make the registry changes stick.
another possible cause for problems could be, that you may have installed the nvidia drivers, but are missing (full) OpenCL support.
there have been some changes as to how OpenCL devices are enumerated since 2016, but not many.

there are some known issues with some OpenCL versions (especially those provided by AMD) and the GPU package kernels, but we can exclude that this applies here, since you could run successfully with the same drivers on some other machine.

at the moment everything points at your GPU package problems being caused by some local driver driver/software configuration or installation issue and not LAMMPS itself. the only way i currently see to disprove this hypothesis would be to swap hardware and swap a GPU causing problems with the one in the machine where there are no problems and check if the issue migrates with the GPU or stickes with the machine.

i have successfully tested the package gpu devices option on a linux box with multiple OpenCL ICDs and i have been able to run OpenCL kernels on windows on the CPU, but i have no way to test on windows with a (recent) nvidia GPU. so there is not much else i can do for you.

please note, that i am maintaining those windows installer packages primarily for the use in tutorials that i am occasionally teaching and we make them available because a lot of people are asking for them and the compilation process is mostly automated. for proper support of Windows (and specifically for porting LAMMPS to fully work with visual c++ and supporting KOKKOS and CUDA) some form of sponsorship (e.g. through gofundme.com or similar) or the help of a volunteer with some free time and more experience in programming/porting of applications on windows with vc++ would be needed.

axel.

Many thanks for your tech support!

I am going to try another PC to test 1660Ti. If any success, I will report.
There is output of latest ocl_get_devices on my current PC:

  1. GTX 760 + GPU driver installed

C:\Program Files\LAMMPS 64-bit 19Sep2019\bin>ocl_get_devices
Found 2 platform(s).
Using platform: Intel® Corporation Intel® CPU Runtime for OpenCL™ Applications OpenCL 2.1 WINDOWS

Platform 0:

Device 0: " Intel® Xeon® CPU E5-2650 v2 @ 2.60GHz"
Type of device: CPU
Double precision support: Yes
Total amount of global memory: 15.9315 GB
Number of compute units/multiprocessors: 16
Total amount of constant memory: 131072 bytes
Total amount of local/shared memory per block: 32768 bytes
Maximum group size (# of threads per block) 8192
Maximum item sizes (# threads for each dim) 8192 x 8192 x 8192
Clock rate: 2.6 GHz
ECC support: No
Device fission into equal partitions: Yes
Device fission by counts: Yes
Device fission by affinity: No
Maximum subdevices from fission: 16

Platform 1:

Device 0: "GeForce GTX 760"
Type of device: GPU
Double precision support: Yes
Total amount of global memory: 2 GB
Number of compute units/multiprocessors: 6
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Maximum group size (# of threads per block) 1024
Maximum item sizes (# threads for each dim) 1024 x 1024 x 64
Clock rate: 1.032 GHz
ECC support: No
Device fission into equal partitions: No
Device fission by counts: No
Device fission by affinity: No
Maximum subdevices from fission: 1

C:\Program Files\LAMMPS 64-bit 19Sep2019\bin>

  1. GTX 1660 Ti + GPU driver installed

C:\Program Files\LAMMPS 64-bit 19Sep2019\bin>ocl_get_devices
Found 1 platform(s).
Using platform: Intel® Corporation Intel® CPU Runtime for OpenCL™ Applications OpenCL 2.1 WINDOWS

Platform 0:

Device 0: " Intel® Xeon® CPU E5-2650 v2 @ 2.60GHz"
Type of device: CPU
Double precision support: Yes
Total amount of global memory: 15.9315 GB
Number of compute units/multiprocessors: 16
Total amount of constant memory: 131072 bytes
Total amount of local/shared memory per block: 32768 bytes
Maximum group size (# of threads per block) 8192
Maximum item sizes (# threads for each dim) 8192 x 8192 x 8192
Clock rate: 2.6 GHz
ECC support: No
Device fission into equal partitions: Yes
Device fission by counts: Yes
Device fission by affinity: No
Maximum subdevices from fission: 16

C:\Program Files\LAMMPS 64-bit 19Sep2019\bin>

Dear all,

I tried to run 1660 Ti for LAMMPS MD calculations on two different Windows platforms, and all to no avail.
But on Linux, everything works fine! So now the 1660 Ti is installed in the Linux HEDT. Hope my experience will be helpful.

Regards,
Dmitry