Hello all,
I was able to compile LAMMPS with the gpu package and I have already done
some simulations without any errors even with pppm/gpu.
Recently, one of my simulations got a lot of errors with cuda drivers, but
when I disable just the pppm/gpu (replaces pppm for ewald) it runs just
fine. (Noting that I wrote the command suffix gpu in the input file).
The simulation consists of an isolated modified cellulose fragment (an
octamer) with 579 atoms. I'm just testing the simulation. More details in
attached files.
this is an extremely small system, for which GPU acceleration is meaningless.
Together with the LAMMPS Makefiles I used for the software itself and for
the gpu library I'm attaching an input and a data file for the simulations.
I usually run the simulations with mpirun -np 4 and specify the package
command as "package gpu force 0 3 -1".
if you have more CPU cores, then you should use 2 or 3 CPU cores per
GPU to achieve better GPU utilization. of course, this also only makes
sense for a reasonably large system.
I'm using the 1/02/2014 version of LAMMPS; gpu package compiled with
DOUBLE_DOUBLE precision; 4 gpus: TESLAC2050; CUDA version: 5.50; NVIDIA
DRIVER version: 310.49.
Here is the output of the error:
- Using GPGPU acceleration for lj/charmm/coul/long:
- with 1 proc(s) per device.
--------------------------------------------------------------------------
GPU 0: Tesla C2050, 448 cores, 2.2/2.6 GB, 1.1 GHZ (Double Precision)
GPU 1: Tesla C2050, 448 cores, 2.2/1.1 GHZ (Double Precision)
GPU 2: Tesla C2050, 448 cores, 2.2/1.1 GHZ (Double Precision)
GPU 3: Tesla C2050, 448 cores, 2.2/1.1 GHZ (Double Precision)
--------------------------------------------------------------------------
Initializing GPU and compiling on process 0...Done.
Initializing GPUs 0-3 on core 0...Done.
Setting up run ...
Cuda driver error 1 in call at file 'geryon/nvd_kernel.h' in line 364.
[...]
help / error messages
CUDA error messages are often not very helpful. usually, all you get
is "it worked" or "it didn't work". for a modular GPU interface with a
CUDA/OpenCL abstraction like the GPU package, it is difficult to
provide more specific error location hints without a lot of additional
programming effort. in any case, my guess is that for such a small
system, you may have a problem because there may not be any atoms for
the MPI rank that a GPU is attached to and that can cause all kinds of
problems that are not easily seen, since people rarely run tests with
such "unreasonable" input decks.
I'm wondering if this is an known issue and if there are already solutions
for that.
there are multiple things that you should do:
- run some GPU stress tests (there is a gpumemtest on sourceforge
IIRC, for example) and also check the GPU error status and make sure
that all your GPUs are operating without failure.
- run without GPU acceleration for pppm. specifically for double
precision, it should be faster to run pppm on the CPU concurrently
with the pair style on the GPU rather than one after the other.
- run with a (much) larger problem.
axel.