Insufficient memory on accelerator (src/GPU/pair_lj_cut_coul_long_gpu.cpp:136)

zchen · October 18, 2021, 9:32am

Hello all:
My simulation stops with error:
ERROR on proc 10: Insufficient memory on accelerator (src/GPU/pair_lj_cut_coul_long_gpu.cpp:136)

The system has ~500,000 atoms (most are waters, to guarantee the least image principle in PBC condition), I guess the system is too big since Lammps ran without error when a smaller system is built for input. So is my guess right? ( If that is the case, I need to reduce the water number)
Or are there any reasons I missed?

Sincere,
Zhongquan Chen

akohlmey · October 18, 2021, 9:57am

When reporting such issues you always need to provide some basic info so that it is easier to give suitable advice:
What version of LAMMPS are you using? and what kind of GPU do you have?
How did you configure LAMMPS and the GPU package?
Please provide the output of ocl_get_devices or nvc_get_devices depending of which was compiled.

It is a very bad idea in science to guess. If the number of (local) atoms is the reason for overflowing then it will be due to overflowing the neighbor list memory requirements. This can be easily confirmed empirically by either reducing the cutoff(s) or by reducing the system size.

Please note that most of LAMMPS is not subject to minimum image requirements. So if the science of your model doesn’t require a larger cell (for example because you need to avoid interactions of the periodic images of a solute with themselves), you don’t need to have it simply because of the cutoff.

zchen · October 18, 2021, 11:37am

Hello Axel:
Thank you for the tips. the Lmp version is Large-scale Atomic/Molecular Massively Parallel Simulator - 5 May 2020
Git info (unstable / patch_5May2020-modified).

The output of ocl_get_device is as shown at the bottom. Yes then it is due to overflowing the neighbor list memory requirements since the error goes away in smaller system size and reducing the cutoffs.

Found 1 platform(s).
Platform 0:
Device 0: “NVIDIA GeForce GTX 1080 Ti”
Type of device: GPU
Supported OpenCL Version: 3.0
Is a subdevice: No
Double precision support: Yes
Total amount of global memory: 10.9165 GB
Number of compute units/multiprocessors: 28
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Maximum group size (# of threads per block) 1024
Maximum item sizes (# threads for each dim) 1024 x 1024 x 64
Clock rate: 1.62 GHz
ECC support: No
Device fission into equal partitions: No
Device fission by counts: No
Device fission by affinity: No
Maximum subdevices from fission: 1
Shared memory system: No
Subgroup support: No
Shuffle support: Yes
Device 1: “NVIDIA GeForce GTX 1080 Ti”
Type of device: GPU
Supported OpenCL Version: 3.0
Is a subdevice: No
Double precision support: Yes
Total amount of global memory: 10.9165 GB
Number of compute units/multiprocessors: 28
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Maximum group size (# of threads per block) 1024
Maximum item sizes (# threads for each dim) 1024 x 1024 x 64
Clock rate: 1.62 GHz
ECC support: No
Device fission into equal partitions: No
Device fission by counts: No
Device fission by affinity: No
Maximum subdevices from fission: 1
Shared memory system: No
Subgroup support: No
Shuffle support: Yes

nvc_get_devices:
Found 1 platform(s).
CUDA Driver Version: 11.40

Device 0: “NVIDIA GeForce GTX 1080 Ti”
Type of device: GPU
Compute capability: 6.1
Double precision support: Yes
Total amount of global memory: 10.9165 GB
Number of compute units/multiprocessors: 28
Number of cores: 5376
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.62 GHz
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: No

Device 1: “NVIDIA GeForce GTX 1080 Ti”
Type of device: GPU
Compute capability: 6.1
Double precision support: Yes
Total amount of global memory: 10.9165 GB
Number of compute units/multiprocessors: 28
Number of cores: 5376
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.62 GHz
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: No

akohlmey · October 18, 2021, 11:59am

You may want to upgrade your LAMMPS version to the latest version. There have been significant improvements to OpenCL support since your version was released and also the GPU memory requirements have been lowered somewhat, not sure by how much, though.