Dear developers:
I’m a new developer working on a GPU accelerator library for a new algorithm, similar to the PPPM
library. I’m utilizing the geryon
library located in lammps/lib/gpu/geryon
for development. However, I’ve encountered some issues when using the UCL_Vector
class.
Problem Description
When the program is running, the forces and energy can be correctly calculated. However, there are something wrong in the process of destructing the class I defined in the LAMMPS_AL namespace. The following error occurs:
in call at file '/dssg/home/acct-hpc/project/00_LAMMPS/lammps/lib/gpu/geryon/nvd_memory.h' in line 85.
Cuda driver error 4 in file '/dssg/home/acct-hpc/project/00_LAMMPS/lammps/lib/gpu/geryon/nvd_memory.h' in line 85.
Cuda driver error 4 in call at file '/dssg/home/acct-hpc/project/00_LAMMPS/lammps/lib/gpu/geryon/nvd_memory.h' in line 85.
Cuda driver error 4 in file '/dssg/home/acct-hpc/hpclqz/project/00_LAMMPS/lammps/lib/gpu/geryon/nvd_memory.h' in line 85.
Cuda driver error 4 in call at file '/dssg/home/acct-hpc/project/00_LAMMPS/lammps/lib/gpu/geryon/nvd_memory.h' in line 85.
Cuda driver error 4 in file '/dssg/home/acct-hpc/project/00_LAMMPS/lammps/lib/gpu/geryon/nvd_memory.h' in line 85.
These errors occur because the UCL_H_Vec<devtype> _buffer
is not correctly cleared when destroying the UCL_Vector
. Using cuda-gdb
, I discovered that the _cols
variable in the private class UCL_H_Vec<devtype> _buffer
is not set to 0 after correctly clearing the UCL_H_Vec<hosttype> host
and UCL_D_Vec<devtype> device
.
Case in My Program
I have define 4 UCL_Vector
class in my class. They are:
UCL_Vector<acctyp, acctyp> pxyz;
UCL_Vector<numtyp3, numtyp3> K;
UCL_Vector<numtyp2, numtyp2> Rho;
UCL_Vector<numtyp2, numtyp2> Rho_All;
After I call clear functions like:
pxyz.clear();
K.clear();
Rho.clear();
Rho_All.clear();
The pxyz
can be cleared correctly. The _cols
of the _buffer
is 0 as follows:
(cuda-gdb) print pxyz
$1 = {host = {<ucl_cudadr::UCL_BaseMat> = {
_vptr.UCL_BaseMat = 0x33d1160 <vtable for ucl_cudadr::UCL_H_Vec<double>+16>, _cq = 0x0,
_kind = UCL_VIEW}, _array = 0x14d2c7643a00, _end = 0x14d2c7643a18, _row_bytes = 24, _cols = 0},
device = {<ucl_cudadr::UCL_BaseMat> = {
_vptr.UCL_BaseMat = 0x33d1140 <vtable for ucl_cudadr::UCL_D_Vec<double>+16>, _cq = 0x0,
_kind = UCL_VIEW}, _row_bytes = 24, _row_size = 0, _rows = 0, _cols = 0, _array = 22895518891008},
_buffer = {<ucl_cudadr::UCL_BaseMat> = {
_vptr.UCL_BaseMat = 0x33d1160 <vtable for ucl_cudadr::UCL_H_Vec<double>+16>, _cq = 0x0,
_kind = UCL_VIEW}, _array = 0x0, _end = 0x0, _row_bytes = 0, _cols = 0}}
However, other three UCL_Vector
classes do not behave correctly. Take class K
for example. After calling K.clear()
, the contents of K
is:
$4 = {host = {<ucl_cudadr::UCL_BaseMat> = {
_vptr.UCL_BaseMat = 0x33d19c0 <vtable for ucl_cudadr::UCL_H_Vec<_lgpu_float3>+16>, _cq = 0x0,
_kind = UCL_VIEW}, _array = 0x4f7ee50, _end = 0x4f805c0, _row_bytes = 6000, _cols = 0},
device = {<ucl_cudadr::UCL_BaseMat> = {
_vptr.UCL_BaseMat = 0x33d19a0 <vtable for ucl_cudadr::UCL_D_Vec<_lgpu_float3>+16>, _cq = 0x0,
_kind = UCL_READ_WRITE}, _row_bytes = 6000, _row_size = 0, _rows = 0, _cols = 500,
_array = 22895518870528}, _buffer = {<ucl_cudadr::UCL_BaseMat> = {
_vptr.UCL_BaseMat = 0x33d19c0 <vtable for ucl_cudadr::UCL_H_Vec<_lgpu_float3>+16>, _cq = 0x0,
_kind = UCL_READ_WRITE}, _array = 0x14d2c7640200, _end = 0x14d2c7641970, _row_bytes = 6000,
_cols = 500}}
Above all, during the destruction of the K
, Rho
, and Rho_All
instances of the UCL_Vector
class, the classes invoke the _host_free
function in lib/gpu/geryon/nvd_memory.h
. This leads to the assertion assert(0==1)
triggered by CU_DESTRUCT_CALL(cuMemFreeHost(mat.begin()))
.
My Solution
I use the most simple solution. I add the _buffer.clear()
in clear()
function of class UCL_Vector
in lib/gpu/geryon/ucl_vector.h
. After recompiling, all the problems have gone.
inline void clear()
{ host.clear(); _buffer.clear(); device.clear(); }
I know that solving it this way might seem foolish, but I haven’t been able to find the error in my program. So I sincerely ask if anyone has any experience solving this type of problem.
Sincere thanks to everyone.