USER-CUDA memory problem

Dear Lammps users,

I would like to report a possible memory leak in Lammps when compiled and run with USER-CUDA package.
First of all, I am not a computer scientist and don’t know how to determine and prove unambiguously the memory leakage, so I will describe the setup and my observations in some (maybe unnecessary) detail.
The most recent Lammps version (10Aug15) was built with USER-CUDA package and cufft from cuda 7.0 lib (GPU GTX 970 dirver ver. 346.59) and successfully tested against provided benchmarks.
When running a relatively small system (14320 atoms) in NVT ensemble with cuda package enabled, just after initialization the log reports the following:

Using device 0: GeForce GTX 970
Using LAMMPS_CUDA
CUDA: Activate GPU
CUDA: Using precision: Global: 4 X: 8 V: 8 F: 4 PPPM: 4
CUDA: VerletCuda::setup: Upload data…
Test TpA
Test BpA

CUDA: Timing of parallelisation layout with 10 loops:
CUDA: BpA TpA
0.083633 0.127412
CUDA: Total Device Memory usage post setup: 605.796875 MB

After 3 days of continuous run the log reports:

CUDA: Timing of parallelisation layout with 10 loops:
CUDA: BpA TpA
0.074427 0.103899
CUDA: Total Device Memory usage post setup: 2047.113281 MB
CUDA: Using precision: Global: 4 X: 8 V: 8 F: 4 PPPM: 4
CUDA: VerletCuda::setup: Upload data…
Test TpA
Test BpA

CUDA: Timing of parallelisation layout with 10 loops:
CUDA: BpA TpA
0.074386 0.109083
CUDA: Total Device Memory usage post setup: -2044.886719 MB
CUDA: Using precision: Global: 4 X: 8 V: 8 F: 4 PPPM: 4
CUDA: VerletCuda::setup: Upload data…

Moreover, the consumption of system RAM is increased by a factor of 6 (from ~600MB to ~3600MB, see the attachment).
Below I provide an excerpt from the relevant part of the input script:

fix SHAKE water shake 0.0001 20 0 b 9 a 15
fix 3 all nvt temp {T} {T} 50
fix 2 all momentum 10000 linear 1 1 1 rescale

label loopa
variable a loop {LoopVar} {Na}
thermo_style custom step temp ke pe density press vol
thermo 1000
fix densityProf polyC ave/spatial 1000 50 50000 z lower 1.0 density/number ave running file densityProfile.out.$a
fix densityProfw waterC ave/spatial 1000 50 50000 z lower 1.0 density/number ave running file densityProfilew.out.a run {Nstep}
write_restart restartfile.$a
unfix densityProf
unfix densityProfw
next a
jump in.lmp loopa
quit

From 5 fixes used during the simulation, shake and npt are cuda-compatible and the rest are called every 1000 steps (the last 2 are destroyed at the end of each cycle). The same system with the same input script, but run with GPU-package takes about 600 MB and doesn’t grow in size for a week or even longer.
Any clues or ideas?

All regards,
–Vitaly

memory.txt (4.97 KB)

Dear Lammps users,

I would like to report a possible memory leak in Lammps when compiled and
run with USER-CUDA package.

please note that the USER-CUDA package is at this point effectively
unmaintained.
contributed bugfixes and improvements to the code will be accepted
into the distributions, but there is currently nobody that has
indicated to take over maintenance of the USER-CUDA package. so if you
want a supported GPU acceleration, you have to use the GPU package.
theoretically, you could also use the KOKKOS package, but it still is
rather experimental and not recommended to use unless you want to
develop code for the KOKKOS package and have the necessary skills for
that.

First of all, I am not a computer scientist and don't know how to determine

knowing how to debug a memory leak and being a computer scientists are
rather unrelated.

axel.