How to save GPU memory?

Dear LAMMPS users,

Hello, I’m having trouble saving memory when using GPU.
Using GPU, the memory is used many times more than that without GPU.

In my calculation, I use 15 million molecules with 8 GPUs(Tesla A100 40GB).
But the following error is shown during the calculation, and the calculation stops.

ERROR on proc 1: Insufficient memory on accelerator (src/GPU/pair_lj_cut_gpu.cpp:121)

How can I save GPU memory?

Cheers.

Please don’t use the [lammps-users] prefix on your subject line. That happens only from archiving mailing list messages in the forum (as a read-only category). If you use that for your post, it is confusing people that it may be a mailing list message which must be replied to on the mailing list (as the mailing list archive is read-only).

There is not enough information here to make any further comments, e.g. what is your LAMMPS version?

The error message may be misleading. But for a closer scrutiny it is required to see the input and log file.

I’m sorry for using the prefix.
I won’t use it.

There is not enough information here to make any further comments, e.g. what is your LAMMPS version?
The error message may be misleading. But for a closer scrutiny it is required to see the input and log file.

I see.
My LAMMPS version is the latest stable version “29 Sep 2021”.
By using CMake, I built LAMMPS that support MPI and GPU.
The GPU is NVIDIA A100(Ampere architecture).

My code is the following.

units	lj
dimension	3
boundary	p	p	p
atom_style	atomic

region		dere sphere 0 0 0 160 units box
region		imas block -400 400 -400 400 -400 400 units box
create_box	1    imas

lattice	fcc 1.088
create_atoms	1 region dere
mass	* 1.0
pair_style	lj/cut 5.0
pair_coeff * * 1 1

reset_timestep 0
timestep 0.0078125
thermo 200
thermo_style custom step lx ly lz  temp

fix 1 all nvt temp 0.8 0.8 1.0
dump		1 all custom 100 dump_file/dump.*.melt id x y z vx vy vz

run	10000

I perform the simulation of lj molecules distributed in a sphere with the radius of 160.
In this case, about 16 millions molecules are used.
And, the calculation can work when the cutoff is decreased: cutoff = 3.0.

But when I change the radius of the sphere for 150, the calculation is done.
In this case, about 13 millions molecules are used.

Please note: the LJ potential describes individual atoms/particles, not molecules (those may be representing molecules in your model, but to avoid confusion when discussing LAMMPS, it is better to stick with the LAMMPS convention where a molecule is built from multiple atoms connected via bond potentials or similar).

It looks like you are running out of memory for real. In classical MD codes like LAMMPS, the largest amount of RAM is consumed by the neighbor list. Since you have three dimensions, the number of neighbors increases O(r**3) with the cutoff radius and O(N) with the number of particles.
You can verify this by running on the CPU and looking at the memory consumption info and (more importantly) the neighbor list info in the post run summary (just do a “run 0”) in the input. When increasing the cutoff you should see a significant increase in neighbors. Using a 5 sigma cutoff is rather large (it corresponds to about 20 angstroms in real units where cutoffs of 10-12 angstrom are common).

Thank you for your noting, and for your kind answer.

You can verify this by running on the CPU and looking at the memory consumption info and (more importantly) the neighbor list info in the post run summary (just do a “run 0”) in the input.

Actually, I have already calculated it by CPU (120 MPI parallel), although the calculation time is too long.
In this calculation, I used 16323427 atoms (cutoff=5.0).
And, the average of Per MPI rank memory allocation is 816.4Mbyte (this leads to about 95 GB).
And, the average neighbor/atom is about 330.

Therefore, I probably understood the reason for the large amount of memory consumed.

Can I ask a new question?
You wrote that “Using a 5 sigma cutoff is rather large (it corresponds to about 20 angstroms in real units where cutoffs of 10-12 angstrom are common).”.
This means that the cutoff is typically about 3 sigma.
Is this to save computation?

Yes, you trade off accuracy against performance. With the attractive branch of the 12-6 LJ potential changing with r**6 the interaction has weakened quite significantly by that distance. This is particularly true for bulk liquids or solids where there is also a large amount of error cancellation (less so for systems with interfaces).

If you want more accuracy and still reduce the computational cost, you can look into using a long-range solver also for the r**6 part of the LJ interactions via pair_style lj/long/coul/long command — LAMMPS documentation with coulomb turned off and kspace style pppm/disp
This has not been ported to GPUs though, but you can reduce the required cutoff quite substantially.
A discussion of that is in:

https://doi.org/10.1021/ct4004614

Thank you for your answers,

It’s sad that pair_style lj/long/coul/long has not been ported to GPUs.
But I try to use it and compare the calculation time.

Thank you!

Also, Thank you for introducing the paper.