Dear Lammps users,
I would like to report a possible memory leak in Lammps when compiled and run with USER-CUDA package.
First of all, I am not a computer scientist and don’t know how to determine and prove unambiguously the memory leakage, so I will describe the setup and my observations in some (maybe unnecessary) detail.
The most recent Lammps version (10Aug15) was built with USER-CUDA package and cufft from cuda 7.0 lib (GPU GTX 970 dirver ver. 346.59) and successfully tested against provided benchmarks.
When running a relatively small system (14320 atoms) in NVT ensemble with cuda package enabled, just after initialization the log reports the following:
Using device 0: GeForce GTX 970
Using LAMMPS_CUDA
CUDA: Activate GPU
CUDA: Using precision: Global: 4 X: 8 V: 8 F: 4 PPPM: 4
CUDA: VerletCuda::setup: Upload data…
Test TpA
Test BpA
CUDA: Timing of parallelisation layout with 10 loops:
CUDA: BpA TpA
0.083633 0.127412
CUDA: Total Device Memory usage post setup: 605.796875 MB
After 3 days of continuous run the log reports:
CUDA: Timing of parallelisation layout with 10 loops:
CUDA: BpA TpA
0.074427 0.103899
CUDA: Total Device Memory usage post setup: 2047.113281 MB
CUDA: Using precision: Global: 4 X: 8 V: 8 F: 4 PPPM: 4
CUDA: VerletCuda::setup: Upload data…
Test TpA
Test BpA
CUDA: Timing of parallelisation layout with 10 loops:
CUDA: BpA TpA
0.074386 0.109083
CUDA: Total Device Memory usage post setup: -2044.886719 MB
CUDA: Using precision: Global: 4 X: 8 V: 8 F: 4 PPPM: 4
CUDA: VerletCuda::setup: Upload data…
Moreover, the consumption of system RAM is increased by a factor of 6 (from ~600MB to ~3600MB, see the attachment).
Below I provide an excerpt from the relevant part of the input script:
fix SHAKE water shake 0.0001 20 0 b 9 a 15
fix 3 all nvt temp {T} {T} 50
fix 2 all momentum 10000 linear 1 1 1 rescale
label loopa
variable a loop {LoopVar} {Na}
thermo_style custom step temp ke pe density press vol
thermo 1000
fix densityProf polyC ave/spatial 1000 50 50000 z lower 1.0 density/number ave running file densityProfile.out.$a
fix densityProfw waterC ave/spatial 1000 50 50000 z lower 1.0 density/number ave running file densityProfilew.out.a
run {Nstep}
write_restart restartfile.$a
unfix densityProf
unfix densityProfw
next a
jump in.lmp loopa
quit
From 5 fixes used during the simulation, shake and npt are cuda-compatible and the rest are called every 1000 steps (the last 2 are destroyed at the end of each cycle). The same system with the same input script, but run with GPU-package takes about 600 MB and doesn’t grow in size for a week or even longer.
Any clues or ideas?
All regards,
–Vitaly
memory.txt (4.97 KB)