GPU package

I installed the GPU package according to the instruction given at http://lammps.sandia.gov/doc/accelerate_gpu.html

Everything went smoothly and when I run the command

mpirun lmp_ubuntu -pk gpu 1 -in in.Li-dendritic.nvt

the calculaton works well (though I doubt if it is by GPU) but when I run the command

mpirun lmp_ubuntu -sf gpu -pk gpu 1 -in in.Li-dendritic.nvt

I receive the error:
LAMMPS (16 Mar 2018)
Cuda driver error 999 in call at file ‘geryon/nvd_device.h’ in line 273.
Cuda driver error 999 in call at file ‘geryon/nvd_device.h’ in line 273.
Cuda driver error 999 in call at file ‘geryon/nvd_device.h’ in line 273.
Cuda driver error 999 in call at file ‘geryon/nvd_device.h’ in line 273.

I installed the GPU package according to the instruction given at
http://lammps.sandia.gov/doc/accelerate_gpu.html

Everything went smoothly and when I run the command

mpirun lmp_ubuntu -pk gpu 1 -in in.Li-dendritic.nvt

the calculaton works well (though I doubt if it is by GPU) but when I run
the command

​it won't run on the GPU unless you explicitly request pair styles with
names ending in /gpu that input

mpirun lmp_ubuntu -sf gpu -pk gpu 1 -in in.Li-dendritic.nvt

​that will try to find /gpu styles, even if they are not explicitly named,
by appending /gpu to every style name and falling back to ​non-gpu
versions, if gpu variants are not available. please see the LAMMPS manual
for more details on this.

I receive the error:
LAMMPS (16 Mar 2018)
Cuda driver error 999 in call at file 'geryon/nvd_device.h' in line 273.
Cuda driver error 999 in call at file 'geryon/nvd_device.h' in line 273.
Cuda driver error 999 in call at file 'geryon/nvd_device.h' in line 273.
Cuda driver error 999 in call at file 'geryon/nvd_device.h' in line 273.

​what kind of GPU do you have? did you compile for CUDA or OpenCL? what
version of CUDA - if any - did you use? what are the compiler/GPU settings
in your lib/gpu compilation?​

​axel.​

Thanks for the clarification, axel and apologies for not providing full information. Makefile.linux looks like

ifeq ($(CUDA_HOME),)
CUDA_HOME = /usr/local/cuda-9.1
endif
NVCC = nvcc

Tesla CUDA

#CUDA_ARCH = -arch=sm_21

newer CUDA

#CUDA_ARCH = -arch=sm_13

older CUDA

#CUDA_ARCH = -arch=sm_10 -DCUDA_PRE_THREE
CUDA_ARCH = -arch=sm_30

I use CUDA 9.1 on Ubuntu 18.04. The GPU is Quadro K2100M

Thanks for the clarification, axel and apologies for not providing full
information. Makefile.linux looks like

ifeq ($(CUDA_HOME),)
CUDA_HOME = /usr/local/cuda-9.1
endif
NVCC = nvcc
# Tesla CUDA
#CUDA_ARCH = -arch=sm_21
# newer CUDA
#CUDA_ARCH = -arch=sm_13
# older CUDA
#CUDA_ARCH = -arch=sm_10 -DCUDA_PRE_THREE
CUDA_ARCH = -arch=sm_30

I use CUDA 9.1 on Ubuntu 18.04. The GPU is Quadro K2100M

​please also provide the output of lib/gpu/nvc_get_devices
and try to run the example inputs in bench/FERMI on your GPU.
also, please check whether you have the required permissions to access the
GPU device.

axel.

Sorry, my problem was unrelated to compiling. It was resolved by disabling/enabling the GPU. Then, I was able to successfuly run the command

mpirun lmp_ubuntu -sf gpu -pk gpu 1 -in in.Li-dendritic.nvt

And there were 4 GPU processes by lmp_ubuntu. However, it took 34min, exactly the same as non-GPU run. I understand that GPU-acceleration appears for large number of atoms, but I expected to withness a tiny contribution. This is the fun part of playing with the variables as mentioned in the manual. If I made a breakthrough I will share it with the community.

Thanks for your excellent support.

Sorry, my problem was unrelated to compiling. It was resolved by
disabling/enabling the GPU. Then, I was able to successfuly run the command

mpirun lmp_ubuntu -sf gpu -pk gpu 1 -in in.Li-dendritic.nvt

And there were 4 GPU processes by lmp_ubuntu. However, it took 34min,
exactly the same as non-GPU run. I understand that GPU-acceleration
appears for large number of atoms, but I expected to withness a tiny
contribution. This is the fun part of playing with the variables as
mentioned in the manual. If I made a breakthrough I will share it with the
community.

​please keep in mind, that you do not have a very powerful GPU with a
rather slow memory transfer speed compared to high-end desktop GPUs.
this results in additional overhead. since the GPU kernels essentially have
to do twice ​the work (for more efficient parallelization), it is much more
challenging to have the GPU acceleration offset the overhead with a
moderate sized, mobile GPU compared to a high-end desktop/workstation GPU.
in addition, it depends a lot on the size of the system (as you noted) and
the specifics of the force field setup (which you have not provided).

thus i strongly suggest you experiment first with the benchmark examples
bundled with LAMMPS before rolling your own. that - and the corresponding
performance numbers published on lammps.sandia.gov - will give you an
assessment of what acceleration is possible with those inputs and then you
can assess what might be possible with your specific setup.

​axel.​