Debugging memory leaks / out of memory errors

I am simulating a droplet of NaCl solution on a gold surface which has around ~9000 fluid atoms and ~50000 gold atoms. A smaller version of the simulation with 1100 atoms, did run without any out of memory issues, so I am trying to debug this issue. The slurm error message:

slurmstepd: error: Detected 1 oom-kill event(s) in StepId=3158334.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: cmp277: task 2: Out Of Memory

From the log file, the max memory usage per proc:

Per MPI rank memory allocation (min/avg/max) = 11.2 | 11.91 | 14 Mbytes

which indicates it is not memory limited. I use the stable version of LAMMPS (2 Aug 2023 - Update 2) , and tried to run the simulation with 32 cores and 4GB mem-per-cpu.

My initial thoughts are that the 2 ave/chunk and 1 ave/time fixes in addition to the nvt and nve fixes are causing this. But as I am still a beginner in LAMMPS and yet to totally understand the C++ source code, I am not 100% sure.

Has someone else faced an out-of memory error in such a case? And if so, how does one go about debugging it?

Relevant lines of input script:

## This is where the simulation runs out-of-memory ##

print ""
print "PRODUCTION RUN"
print ""

# Time-averaged calculations

# chunk/atom helps get the space-averaged property values
compute cc1 FLUID chunk/atom bin/3d x lower 0.3 y lower 0.3 z lower 0.3
compute cc2 FLUID com/chunk cc1

# Use the computes defined earlier to get the time-averaged center of mass (to check for drift in droplet), space-averaged number and mass density of FLUID atoms
fix at_com FLUID ave/time 1000 10 10000 c_cc2[*] file outputs/data/com.dat mode vector
fix ac_dn FLUID ave/chunk 1000 10 10000 cc1 density/number ave running file outputs/data/density_number_FLUID(chunk).dat overwrite
fix ac_dm FLUID ave/chunk 1000 10 10000 cc1 density/mass ave running file outputs/data/density_mass_FLUID(chunk).dat overwrite

# Record time-avg FLUID temperature, to analyze fluctuations
compute cc3 FLUID temp
fix at_fluid_temp FLUID ave/time 1000 10 10000 c_cc3 file outputs/data/temp_FLUID.dat

timestep 1
fix nvt1 WALL nvt temp ${Temp} ${Temp} $(100*dt)
fix nve1 FLUID nve

run ${N_prod}

write_data outputs/out.lammpsdata pair ij
write_restart outputs/out.restartlammps

This information is a lower bound of the memory consumption and only reports memory use at the beginning of a run. The memory use can increase significantly during a run based on simulation settings and features used.

You can get a more detailed view of the memory allocation during a run by calling the “info memory” command repeatedly during a run, e.g. by using:

run ${N_prod} every 100 "info memory"

3d binning is often a big consumer of memory. You have to make sure to not use too fine a grid.
Doubling the number of grid points will multiply the memory requirements by a factor of 8. It rarely makes sense to use a bin size less than the diameter of one atom. You are using 0.3 which would be very small for a system in real or metal units.

If I change the “run” statement in the “peptide” example of the LAMMPS distribution as follows:

run             2000 pre no post no every 100 "info memory"

When I run on my desktop with 4 MPI processes, it starts with the memory use message:

Per MPI rank memory allocation (min/avg/max) = 16.01 | 16.22 | 16.41 Mbytes

During the run I have a more-or-less stable memory report of:

Memory allocation information (MPI rank 0):

Total dynamically allocated memory: 16.02 Mbyte
Current reserved memory pool size: 117.4 Mbyte
Maximum resident set size: 178.6 Mbyte

Now I am adding (adapted from your input):

compute cc1 water chunk/atom bin/3d x lower 0.3 y lower 0.3 z lower 0.3
compute cc2 water com/chunk cc1

fix ac_dn water ave/chunk 100 10 1000 cc1 density/number ave running file density_number_water.dat overwrite
fix ac_dm water ave/chunk 100 10 1000 cc1 density/mass ave running file density_mass_water.dat overwrite

And the reported memory use increases to:

Memory allocation information (MPI rank 0):

Total dynamically allocated memory: 93.51 Mbyte
Current reserved memory pool size: 284 Mbyte
Maximum resident set size: 343.2 Mbyte

Of course, since I am using 4 MPI processes, the memory use for the entire node is 4x as large.
Now this is for a tiny system with 2004 atoms now if I use the replicate command to make this a 16032 atom system, the memory use increases to:

Memory allocation information (MPI rank 0):

Total dynamically allocated memory: 644.8 Mbyte
Current reserved memory pool size: 1449 Mbyte
Maximum resident set size: 1412 Mbyte

That is about 5.5GB in total with 4 processes.

I think one can see from this how memory usage can blow up quickly.

Thank you for the very clear explanation, Dr. Kohlmeyer. I will use this to understand how I can reduce my memory usage. I will start with increasing the bin size and then try the bound keyword to only do the 3d binning of the droplet.