Understanding memory usage per processor

Joshua_Studen · February 24, 2017, 6:55pm

Hi All,

I am trying to get an understanding of the memory usage per processor stat that LAMMPS outputs at the end of the run.

At first, I thought this would be the average amount of memory each LAMMPS process uses over the course of its run, but when I periodically look at the memory usage via ps -aux on Linux, the amount of memory each lammps process uses is much less than what’s reported at the end.

For example, lammps will output
Memory usage per processor = 74.8512 Mbytes

but ps will show the VSZ and RSS at 1364012 and 73200; these values don’t change much over the course of the run.

Regards,
Josh

akohlmey · February 24, 2017, 7:02pm

Hi All,

I am trying to get an understanding of the memory usage per processor stat
that LAMMPS outputs at the end of the run.

At first, I thought this would be the average amount of memory each LAMMPS
process uses over the course of its run, but when I periodically look at
the memory usage via ps -aux on Linux, the amount of memory each lammps
process uses is much less than what's reported at the end.

For example, lammps will output
Memory usage per processor = 74.8512 Mbytes

but ps will show the VSZ and RSS at 1364012 and 73200; these values don't
change much over the course of the run.

memory use measurements on Linux are difficult in the first place. the
malloc() function uses mmap() of /dev/zero with copy-on-write for larger
allocations and thus don't show up as used unless their are actually
modified and thus copied (-> memory overcommitment). the address space size
on the other hand can include mmap()'d device buffers and shared memory
segments. thus the actual amount of memory used is not so easy to determine
and somewhere between those two extremes. the output that LAMMPS produces
is based on reporting the size of large memory allocations inside LAMMPS
only, this should be considered a lower bound. also, keep in mind, that
this is the memory use on rank 0, which is often larger than on other
ranks, but for unbalanced particle distributions, it may also be much
smaller.

axel.

sjplimp · February 25, 2017, 4:09pm

One additional comment. The stat that LAMMPS reports

is before the start of the first run, not at the end of the run.

Steve

_Nigel_Park · February 26, 2017, 9:18pm

I have never understood why LAMMPS reports the memory numbers; I can get anything from 17Mb per proc to 20Gb per proc … for the same run! I know it’s a LINUX issue, but since most HPC is LINUX based, it’s just wasted output.

Nigel

akohlmey · February 26, 2017, 10:17pm

I have never understood why LAMMPS reports the memory numbers; I can get
anything from 17Mb per proc to 20Gb per proc … for the same run! I know
it’s a LINUX issue, but since most HPC is LINUX based, it’s just wasted
output.

i agree with your assessment, you don't understand it.

a) it is *not* memory per processor it is the memory used by large memory
allocations on rank 0.
b) it is *not* a Linux issue. the memory reported by LAMMPS is the memory
reported by individual classes' memory_usage() methods. it is thus
completely portable (but dependent on programmers to correctly tally all
memory allocations in their classes), and - as steve already noted -
reporting usage at the beginning of a run and thus not reporting later
allocations.

while this information may not be useful to you personally, it has been and
still is quite helpful in debugging and tuning input to match a given
hardware. it was certainly more useful in the times, when RAM on
supercomputers was more limited than it is now (keep in mind that LAMMPS
has been around a long time and there are quite a few things that might be
done differently, if LAMMPS was starting as a new project now), but on the
other hand, there are people that try to run extremely large systems, and
for those this is quite useful.

the variance you note is quite common in specific cases, e.g., when running
sparse systems with varying numbers of CPUs, or when running with/without
memory intensive analysis styles enabled. for simple, dense, bulk system
simulations, the LAMMPS output is a good estimate for a lower limit of
memory required.

BTW: alternate ways to query the memory used that *are* system specific is
available with the "info config" command. mind you, these are still for
rank 0 only (and optional).

axel.

sjplimp · February 27, 2017, 3:59pm

I’ll add this:

a) it is not memory per processor it is the memory used by large memory allocations on rank 0.

In most simulations, the memory used by each MPI rank (e.g. each core) is

nearly the same. So the output is usually a good estimate of memory/processor, as advertised.

Since it includes all the large memory allocations that LAMMPS has performed

(at start-up time) for your simulation, it’s generally a good lower-bound on the

memory your run will consume, i.e. close to actual use.

Exceptions would be a scenarios where you start with an empty

box and add all your atoms during the run (e.g. a granular pour),

or if the load-balance (atoms/proc) varies widely during a run. E.g.

a big imbalance could result in one proc using a lot more memory.

I have never understood why LAMMPS reports the memory numbers; I can get anything from 17Mb >per proc to 20Gb per proc … for the same run!