[lammps-users] MPI/OpenMP optimalization with EAM potential.

Dear Axel (and other LAMMPS users)

I noticed your interesting discussion regarding MPI/OpenMP with user Sabine from the beginning of January 2011, and I have some questions regarding this.

In our group we do some simulations of metal nanopillars now, where we have a cylindrical, freestanding pillar that is attached to a surface (thin film) which is kept fixed at the bottom. Then we indent the top of the pillar using a planar indenter. (A picture showing the situation is attached.) The potential used is an EAM potential.
This system, however, has a very uneven load on different nodes when running on a supercomputer. My guess is that LAMMPS divides the whole simulation box into equally sized cubes, regardless of whether there are (many) atoms there or not. This will of course result in some cubes will few or no atoms inside, and cubes that are full of atoms.

Do you see a way that your hybrid MPI/OpenMPI in the LAMMPS-ICMS can make this simulation (a lot) faster? I wonder specifically how to best specify the OpenMPI flags, and how to use this in combination with the "processors" command in LAMMPS. We are aiming for very large pillars in these simulations, so any way of improving the efficiency is greatly appreciated.

Thank you so much for your help, and for all your work on LAMMPS. It is a wonderful tool to do science with.

Also, if other users have suggestions or experience with this kind of problem, please contribute to the discussion.

Sincerely,
Christer H. Ersland.

pillar_d400.png

Dear Axel (and other LAMMPS users)

dear christer,

I noticed your interesting discussion regarding MPI/OpenMP with user Sabine from the beginning of January 2011, and I have some questions regarding this.

ok.

In our group we do some simulations of metal nanopillars now, where we have a cylindrical, freestanding pillar that is attached to a surface (thin film) which is kept fixed at the bottom. Then we indent the top of the pillar using a planar indenter. (A picture showing the situation is attached.) The potential used is an EAM potential.

ok.

This system, however, has a very uneven load on different nodes when running on a supercomputer. My guess is that LAMMPS divides the whole simulation box into equally sized cubes, regardless of whether there are (many) atoms there or not. This will of course result in some cubes will few or no atoms inside, and cubes that are full of atoms.

you don't have to guess. the LAMMPS parallelization papers and
presentations state
that the scheme that LAMMPS employs is exactly that. LAMMPS uses a static
domain decomposition assuming a homogeneous particle density across the whole
system without any kind of load balancing.

Do you see a way that your hybrid MPI/OpenMPI in the LAMMPS-ICMS can make this simulation (a lot) faster? I wonder specifically how to best specify the OpenMPI flags, and how to use this in combination with the "processors" command in LAMMPS. We are aiming for very large pillars in these simulations, so any way of improving the efficiency is greatly appreciated.

to optimize how to run LAMMPS for such a system, is a multi-step procedure.
first of all, you can try to distribute the load better with MPI-only.
if you specify the processors keyword as:
processors 2 2 *
and then use only processor counts that are multiples of 4, you should have a
better load distribution.

that still leaves the fixed bottom area. here you can play with extending the
box in the hope to have the domains set up so that the number of particles
per domain is more balanced. how well that can work, depends on the total
number of processors that you intent to use.

the OpenMP parallelization in its current implementation does not scale as
well as the MPI parallelization, but in this specific case, however, it can help
keeping the domain decomposition in an optimal range and then provide
additional parallelism on top of that.

EAM is particularly problematic in this case, since you have additional MPI
communication within the pair style (to communicate the atom/electron
densities), which may have an impact on the scaling.

but then again, all of the /omp potentials employ the portable and "safe"
subset of the optimizations from the OPT package, so there may even
be some gain without using OpenMP.

everything beyond that is very machine specific, so you should describe
what kind of hardware you plan to run on and what node/processor counts
you hope to scale to.

cheers,
    axel.