"ERROR on proc 3: Insufficient memory on accelerator" when running GPU using lammps_v29Sep21

YouzhiHao · December 2, 2021, 8:43am

Recenly, I installed lammps-stable-29Sep2021 on Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-41-generic x86_64) with “NVIDIA GeForce RTX 3080 Ti”. However, when I run lammps job with GPU package enabled, I got an error stated that “ERROR on proc 3: Insufficient memory on accelerator”.
Belowing is the information for my lammps and systems:
(1) The lammps are installed with package GPU,OPENMP,and cuda, etc. using the CMake commands:
hyz@slscc:~/Programs/lammps-29Sep2021/build$ cmake …/cmake/ -D LAMMPS_MEM
ALIGN=64 -D FFT=FFTW3 -D PKG_COMPRESS=yes -D PKG_GPU=yes -D GPU_API=cuda -D GPU_PREC=mixed -D GPU_ARCH=sm_86 -D PKG_OPENMP=yes -D PKG_KSPACE=yes -D PKG_MOLECULE=yes -D PKG_RIGID=yes -D WITH_GZIP=yes -D WITH_JPEG=yes -D WITH_PNG=yes -D WITH_FFMPEG=yes -D FFMPEG_EXECUTABLE=/usr/local/bin
hyz@slscc:~/Programs/lammps-29Sep2021/build$ cmake --build ./ -j 32
the lammps building process is ok as shown in the picture below:

when I type “lammps -help”, detailed lammps exe infor is shown below:
OS: Linux “Ubuntu 20.04.3 LTS” 5.11.0-41-generic on x86_64
Compiler: GNU C++ 9.3.0 with OpenMP 4.5
C++ standard: C++11
MPI v3.1: Open MPI v4.1.2, package: Open MPI hyz@slscc Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021
Accelerator configuration:
GPU package API: CUDA
GPU package precision: mixed
OPENMP package API: OpenMP
OPENMP package precision: double
Compatible GPU present: yes
Active compile time flags:
-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_JPEG
-DLAMMPS_FFMPEG
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint): 32-bit
sizeof(bigint): 64-bit
Installed packages:
COMPRESS GPU KSPACE MOLECULE OPENMP RIGID

(2) the linux system is Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-41-generic x86_64)

(3)The GPU cuda is installed from the Nvidia web site following its instructions. The cuda version is 11.5 along with the driver 495.44 as shown below:
*hyz@slscc:~$ nvidia-smi *
*Thu Dec 2 16:42:34 2021 *
±----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:02:00.0 On | N/A |
| 0% 38C P8 28W / 400W | 63MiB / 12052MiB | 0% Default |
| | | N/A |

and the cmd “./nvt_get_devices” showed this as in the picture:

(4) Jobs running using lammps:
I prepared the lammps input script that contained:
if $${pk} ==gpu then &
package gpu 1 neigh no newton off split 1.0" &
elif $${pk}==cpu package omp 4" (Note: $$ is one dollar sign in this web post)
Then I type the cmd-line with either GPU or CPU (omp):
test1: $ mpirun -np 4 lammps -in inputlammps.txt -var pk cpu
The running is OK, steps going on.

test2: $mpirun -np 4 lammps -in inputlammps.txt -var pk gpu
Then I got the error as shown in the picture below:

The ways I have tried are:
1>Tried many changes in the keywords of “package gpu 1 neigh no newton off split 1.0” &
2>Reinstalling the lammps with cuda_arch=sm_80 or sm_86
3>Build the lammps with traditional make
4>Additionally, the lammps with GPU cuda and mpirun went very well with the Nvidia GTX1080 on Ubuntu 16.04, which this system is now disk-wiped and reinstalled to be ubuntu 20.04 and graphics card upgraded to Nvidia GTX 3080 Ti.
5>I did some research on “shared memory” and type "ipcs -m ", it shows this:
hyz@slscc:~ ipcs -m
Shared Memory Segments
key shmid owner perms bytes nattch status
*0x00000000 18 shenliu 600 524288 2 dest *
*0x00000000 19 shenliu 600 4194304 2 dest *
*0x00000000 38 shenliu 600 4194304 2 dest *
0x00000000 39 shenliu 600 524288 2 dest
hyz@slscc:~$ free -t -m
total used free shared buff/cache available
Mem: 64190 1331 58332 5 4527 62146
Swap: 7818 0 7818
Total: 72009 1331 66151

The system have 5 users, but Shared Memory Segments showed only one user, looks weird, but I figure out nothing.

Until now, I’m struggling with this GPU error “ERROR on proc 3: Insufficient memory on accelerator”. Please indicate the way, thank you!

akohlmey · December 2, 2021, 11:06am

Can your input run without GPU acceleration? and what is the output from that?
There is a lot of required important information missing about the kind of system that you are running, for example.

This cannot make a difference.

This also cannot make a difference.

Shared memory has nothing to do with GPUs.

How much GPU RAM did this GPU have? What was the LAMMPS version? Do you have some log file of a successful run?

There is not much that can be done based on the limited information you provide. To do any testing or debugging, it would be required to have access to your specific input deck.
Also, you could try and verify if you can run the LAMMPS input decks in the “bench” and “examples” (especially “peptide” and “rdf-adf”) to see that there is no problem in general, but rather something specific to your input deck.

YouzhiHao · December 2, 2021, 12:06pm

1> The lammps runs OK without GPU, for example using OPENMP pkg. The kind of system running is Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-41-generic x86_64), and the lammps runs for the research water/gas confined in silica pores which uses atomic force field
2>The RAM infor for Nvidia GTX1080 is:

The LAMMPS version is lammps-11Aug2017
Log file of a successful run is selected as:

Above running job uses GPU pkg. But here no GPU infor is logged because it only echos on screen which I did not record it.

3> The exactly same simulation job are ran both using previous GTX 1080 + Ubuntu 16.04 and current upgraded GTX 3080Ti + Ubuntu 20.04. The previous runs Ok, while the current runs ERROR.

4>I run the examples in LAMMPS folder as
hyz@slscc:~/Programs/lammps-29Sep2021/examples/peptide$ mpirun -np 4 lammps -in in.peptide -pk gpu 1 -sf gpu

The simulation runs OK.

So, now it confuses me that I ran the same job with GPU using previous LAMMPS_11Aug2017 (GTX 1080 + Ubuntu 16.04), but it did not work correctly in current LAMMPS_29Sep2021 (GTX 3080Ti + Ubuntu 20.04) evern with no grammar mistakes of the input file or the lammps cmd-lines are warned.

akohlmey · December 2, 2021, 12:46pm

It is impossible to make any further assessment of the situation without verifying the issue independently which requires access to your complete input deck.

There has been some significant refactoring in the GPU package internal code and that may have impacted the memory requirements (e.g. previous versions may have allocated less memory on the GPU leading to occasional unintentional overwriting of data). Similarly, different driver/cuda versions can have some (minor) impact, but not a lot.

YouzhiHao · December 3, 2021, 8:33am

The ERROR vanished when I uninstall the previous stand-alone Nvidia Driver and reinstall the cuda package that cotained driver &cuda toolkit!
Here are the details:
1> Previously I installed a stand-alone Nvidia Driver v495.44, and then installed the cuda v11.5 without it’s own Driver v495.29.05 because it indicates there have been installed the driver v495.44.
2>Now, the stand-alone Nvidia Driver v495.44 is unistalled. The whole cuda package is installed that include both Driver v495.29.05 and cuda toolkit, etc.
3>Recompile the lammps_29Sep21
4>Rerun the jobs.

However, after build the lammps_29Sep21, the jobs with 19493 atoms tend to occupy all the GPU memory and finally leads to Insufficient Memory ERROR.

1> I download and build an earlier LAMMPS version 24Dec2020, then compare with current LAMMPS version 29Sep2021 using the my job with same parameters. The jobs contains thousands water and methane inside slicicate slit pores using full-atomic styles to reach themal equilibrium.
Here are the comparison results:

The job with19493 atoms shows that LAMMPS_29Sep2021 tend to comsume nearly all the GPU memory while LAMMPS_24Dec2020 consumes only 1/5 GPU memory

2> The example in lammps-29Sep2021/examples/peptide are tested using $ mpirun -np 8 lammps_version -in in.peptide -pk gpu 1 -sf gpu. The comparasion results are:

The example test shows that both LAMMPS versions uses the same GPU memory.

These two LAMMPS versions ( 24Dec2020 v.s. 29Sep2021) have quite large GPU memory usage differences when dealing with different simulation jobs.
It confuses me, but finally, I can use LAMMPS_24Dec2020 to do my jobs. Thanks!

In additon, I built several LAMMPS with version of 10Feb2021, 29Sep2021,27Oct2021, and 9Oct2020, 24Dec2020. Specifically to my own job, LAMMPS 2021 tend to cosume much larger GPU memory than LAMMPS 2020.
Here are the selected main commands in my job script:
units real
*dimension 3 *
atom_style full
boundary p p p
comm_style brick
comm_modify mode single group all vel no
atom_modify id yes map array sort 1000 0.0
*if $$(is_active(package,gpu)) then & *
“pair_style lj/cut/coul/long/gpu $${cutoff}” &
“kspace_style pppm 1.0e-4” &
“bond_style harmonic” &
“angle_style harmonic”
pair_modify mix arithmetic tail no
molecule ch4 mol.ch4.txt toff {ntoff}* *molecule h2o mol.water-spc.txt* *lattice fcc {fccCH4}
create_atoms 0 region RvoidL mol ch4 {lseed} units box* *lattice fcc {fccH2O}
create_atoms 0 region RvoidR mol h2o {lseed} units box* *fix fnvtK pta nvt temp {temp} ${temp} 100.0
fix_modify fnvtK dynamic/dof yes temp Tpta
compute Tmet met temp/com
compute Stresswat water stress/atom Twat
fix mrigid met rigid/nvt/small molecule temp $${temp} $${temp} 100.0 mol ch4
fix fnvtshakewater water nvt temp $${temp} $${temp} 100.0
*fix_modify fnvtshakewater dynamic/dof yes temp Twat *
fix wshake water shake 0.0001 50 0 b 3 a 3 mol h2o
fix_modify wshake dynamic/dof yes *
dump dumpXYZ all xyz $${NdumpXYZ} ./.xyz
thermo_style custom step v_Nmet v_Nwat v_poresize dt time *
thermo $$(v_Nfreqv_nksteps/250)
timestep $${dtstep}
run 5000000

akohlmey · December 3, 2021, 3:25pm

This is all too confusing and convoluted for me to make sense of it and you don’t provide sufficient material to verify this independently.

Please note that there is a difference between a simulation that can finish and a simulation that is correct, and I trust the most recent version of LAMMPS more to be correct.