I wish to run simulations with up to 100,000 atoms on an NVIDIA GPU with 1GB memory, using the Tersoff potential. I am using the USER-CUDA package with double precision and compute capability 3.0 (make precision=2 arch=30 ).
I can run > 200k atoms for a cubic simulation cell, however, when dim z >> dim x, dim y, this number drops to about 20k atoms, which use about 800MB of cuda memory. This is much more than the 50MB that the CPU requires for the same run.
Is this normal or am I doing something wrong?
Can I make some simple code modification to reduce the memory required (possibly at the cost of reduced speed)?
If I get a card with more CUDA cores, do the memory requirements increase?
Christian can comment on memory usage
by the USER-CUDA package.
I have managed to somewhat work around this by reducing the skin distance, but now this memory problem is once again an issue. I would appreciate any help/information.
when you have dim z <<< dim x and dim y is dim z approaching the neighbor cutoff? Your memory footprint also depends on the ghost atoms, and the ratio of ghost to non-ghost is minimal for a cubic box while it can reach infinity if one of the dimensions approaches zero.
A typical example is dim x = dim y = 10A (Angstrom) and dim z = 5000A, with a cutoff around 2A
For such an elongated cell, ghost atoms should be less when cell is divided in the z dim for assignment to processors
e.g. for 2 processors, 2 x 10Ax10Ax2500A has less ghost atoms than 2 x 5Ax10Ax5000A, because the first assignment creates a 10Ax10A boundary, while the second creates a 10Ax5000A boundary
I do not know if this optimization is considered and/or if I can force it with a simple (hard) code modification.