[lammps-users] Memory usage in minimizations

Dear users:

I am using the reax/c potential to model alumina systems. I have quickly realized that memory limitations are defining how big and how long I can run a simulation. I have had several memory allocation errors when running minimizations. I am wondering if there are methods of structural minimization that have less memory demands. Also, before I run dynamics I was wondering if there are any “rules of thumb” to estimate the memory requirements. My current minimization method/script is below:

I am using the reax/c potential to model alumina systems. I have quickly realized that memory limitations are defining how big and >how long I can run a simulation. I have had several memory allocation errors when running minimizations. I am wondering if there are >methods of structural minimization that have less memory demands. Also, before I run dynamics I was wondering if there are any >"rules of thumb" to estimate the memory requirements.

Aidan and Metin will have to comment. I don't know how that package uses
memory. The rest of LAMMPS bookkeeps its memory usage fairly carefully.
How do you know you are having memory issues? Are you getting error
messages? How big is your system?

Steve

I have tried submitting my script using a couple different system sizes. First, I am using 21Dec10 version (I am currently updating to the newest version to test for the same problem). I have been receiving the error message:


1005 10000 0 -1826627.3 471128 -0.21708062 0 142087.98 0 0 0 0 0 275018.48 -1647910.6 0 1266864.6 1.3389161 -1319439.1 0 90627.747 -1319439.1 5466.78

lmp_star-ompi:21795 terminated with signal 11 at PC=2ab7302b3cbf SP=7fff5b9eb270. Backtrace:
/share/apps/mpi/openmpi-1.4/intel-11.1/lib/libopen-pal.so.0(opal_memory_ptmalloc2_int_malloc+0x103f)[0x2ab7302b3cbf]
/share/apps/mpi/openmpi-1.4/intel-11.1/lib/libopen-pal.so.0[0x2ab7302b1da5]
ERROR: failed to allocate 135351232896 bytes for array list:three_bodies/home/sxc033/lammps-21Dec10/src/lmp_star-ompi(_Z7smalloclPcP19ompi_communicator_t+0x1e)[0x6ec21e]

The job fails on 96 cores on 12 nodes with 144 GB of memory. It also fails on 96 cores on 24 nodes with 288 GB of memory. It succeeds on 48 cores on 1 node with 256 GB of memory. Univ of Arkansas HPC support suggests there appears to be a problem on a cluster (but not on shared memory) with either its memory allocation or the interaction of the allocation with openmpi.

Shawn P Coleman

University of Arkansas
Mechanical Engineering

Earlier this week, we released a fix for one memory allocation problem in
reax/c. Have you tried it out? Another thing to try is to use the clear
command at the top of each iteration of the loop, and read the geometry in
from a restart file. The restart file should be written at the bottom of the
loop.

Adian

From: Steve Plimpton <[email protected]>
Date: Thu, 24 Feb 2011 08:57:43 -0700
To: Shawn Coleman <[email protected]...>, Aidan Thompson <[email protected]>,
Hasan Metin Aktulga <[email protected]...>
Cc: "[email protected]" <[email protected]>
Subject: Re: [lammps-users] Memory usage in minimizations

I am using the reax/c potential to model alumina systems. I have quickly
realized that memory limitations are defining how big and >how long I can run
a simulation. I have had several memory allocation errors when running
minimizations. I am wondering if there are >methods of structural
minimization that have less memory demands. Also, before I run dynamics I
was wondering if there are any >"rules of thumb" to estimate the memory
requirements.

Aidan and Metin will have to comment. I don't know how that package uses
memory. The rest of LAMMPS bookkeeps its memory usage fairly carefully.
How do you know you are having memory issues? Are you getting error
messages? How big is your system?

Steve

Dear users:
I am using the reax/c potential to model alumina systems. I have quickly
realized that memory limitations are defining how big and how long I can run
a simulation. I have had several memory allocation errors when running
minimizations. I am wondering if there are methods of structural
minimization that have less memory demands. Also, before I run dynamics I
was wondering if there are any "rules of thumb" to estimate the memory
requirements. My current minimization method/script is below:

#######################
variable M string "100"
variable T string "10000"
label loopa
variable a loop $M
print "loopa=$a"
thermo 1
thermo_modify lost warn flush yes
fix 1 all box/relax tri 1 vmax 0.0001
min_modify line quadratic
minimize 0 0 $T $T
thermo 1
thermo_modify lost warn flush yes
unfix 1
min_modify line quadratic
minimize 0 0 $T $T

next a
jump SELF loopa
#######################

As you can see I am performing several minimizations cycling fix box/relax
on and off. I have found in previous simulations that I could not achieve
the desired pressure with one fix box/relax minimization. To reduce the
memory constraints I have tried varying the number of minimization cycles
($M) and the minimization iteration tolerance ($T).
If you have any suggestions I will greatly appreciate your help.
Thanks,

Shawn P Coleman
University of Arkansas
Mechanical Engineering

----------------------------------------------------------------------------->>

shawn,

the best way to check for a memory allocation problem would be to
create a smallish test input, compile a serial version with debuginfo
and then run it under valgrind with detailed memory tracking.

that should help to identify locations where memory is not properly freed.

in some cases, this may only happen for parallel runs, then it is a bit
more involved, but you can also use valgrind you just have to use the
-p flag to get a different output for each MPI task. i've been through this
a few times and it takes a little effort and patience to get it started,
but that beats the hell out of staring at source code and hoping for
an inspiration.

if this would happen only when running across multiple nodes, but
not on a local node, then i would check the MPI library rather than
LAMMPS.

cheers,
    axel.

Thanks for your suggestions. I am compiling the newest version right now and will try them out.

Shawn P Coleman

University of Arkansas
Mechanical Engineering

Shawn,

ERROR: failed to allocate 135351232896 bytes for array list:three_bodies/home/sxc033/lammps-21Dec10/src/lmp_star-ompi(_Z7smalloclPcP19ompi_communicator_t+0x1e)[0x6ec21e]

I do not think there is anything wrong with the cluster or your openmpi installation. The problem is stemming from the allocation of 3-body list, it is asking for 135351232896 bytes ~ 135 GB, apparently due to a bug. Not surprisingly, it runs when you increase the available memory to 256 GBs. The fix Aidan mentioned fixes a problem with the 3-body list allocation. So I believe this fix would solve your problem. Please keep me updated.

Computing the per atom stress with reax/c requires some extra effort, and unfortunately I do not have time to work on it right now. I will see if my collaborators back at Purdue are interested in working on it.

Thanks,
Metin

Dear Metin,

I believe we sent a fix for this problem some time ago. You may want to check with Aidan.

Best,

Andres

Andres,

I was indeed referring to your fix in my previous email. Thanks for your time on this!

Bests,
Metin

I was able to successfully complete the simulation that was failing earlier because of memory allocation errors after updating to the 18Feb11 version of LAMMPS. Thanks again for all of your help.

Shawn P Coleman

University of Arkansas
Mechanical Engineering