complex domain decomposition and cores mapping in LAMMPS

Dear LAMMPS users,

I’m interested in the current status of the domain decomposition and
load balancing (processor assignment) problems (further, call it
topologies) in LAMMPS. I use processors command and balance fix but
I’m looking for a more general tools (below I explain examples of
problems I’m solving). So I’m interested in works regarding this
issues - may be someone knows a LAMMPS version developed by some
laboratory where people implemented complicated topologies strategies
such as recursive orthogonal bisection or any other.

In my simulations, I simulate a suspension flow in the network of
connected pipes, so the most volume of the computational domain is
free of particles. Thus, the most processors are idle. I’m trying to
use processors command to improve situation by, for instance, dividing
domain in sub-domain only in one direction (in z if my tubes are
longer in this direction). But it solves the problem only for
relatively simple networks.

So I’m thinking, may be there are some people working on implementing
a general topology layer in LAMMPS which allows to express complicated
computational domains as well as processors mappings. If there is no
one working on this problem, how do you think might it be useful for
LAMMPS users in general?

kilill,

since nobody else stepped up so far,
a few comments and a suggestion from
somebody who has some vested interest
in these issues, and a partial solution to offer.

Dear LAMMPS users,

I’m interested in the current status of the domain decomposition and
load balancing (processor assignment) problems (further, call it
topologies) in LAMMPS. I use processors command and balance fix but
I’m looking for a more general tools (below I explain examples of
problems I’m solving). So I’m interested in works regarding this
issues - may be someone knows a LAMMPS version developed by some
laboratory where people implemented complicated topologies strategies
such as recursive orthogonal bisection or any other.

i don't know anybody and it sounds awfully complicated.

In my simulations, I simulate a suspension flow in the network of
connected pipes, so the most volume of the computational domain is
free of particles. Thus, the most processors are idle. I’m trying to
use processors command to improve situation by, for instance, dividing
domain in sub-domain only in one direction (in z if my tubes are
longer in this direction). But it solves the problem only for
relatively simple networks.

So I’m thinking, may be there are some people working on implementing
a general topology layer in LAMMPS which allows to express complicated
computational domains as well as processors mappings. If there is no
one working on this problem, how do you think might it be useful for
LAMMPS users in general?

i believe the vast majority of LAMMPS applications won't need this and
would not want the overhead of having to deal with complex topologies.
the field that may benefit from this the most would be people doing
discrete element modeling or granular media, since that almost always
has to deal with sparsely populated domains. in fact, this is where
the basic concept of the balance command originates from.

as you have already discovered, this approach works reasonably well in
one dimension, but has problems in two and even more in three
dimensions. even more, the number of particles is not always a good
indicator for the amount of work done in a domain, which typically
also depends a lot on the computational effort of an interaction model
and the number of neighbors.

with this in mind, there are two rather simple things that could be
done without having to rewrite the complete communication
infrastructure:
- add a timing feedback to the load balancing, i.e. measure the load
imbalance and try to add this as a bias to the particle count based
load distribution.
- use a particle decomposition on top of the domain decomposition for
parallelization. this is implemented in the USER-OMP package in the
multi-threading support. and rather transparently to the domain
decomposition based parallelization. the only issue here is that the
USER-OMP package was implemented with the goal to make efficient use
of a small number of threads
resulting in limited scaling with a lot of threads. also, only the
most time consuming parts of the calculation were initially
multi-threaded. thus amdahl's law is rearing its ugly head rather
quickly.

i have some ideas to address both issues, but i am currently lacking
the time to implement them in a more general way, but if you are
willing to share some representative input decks, i am happy to
discuss off-list and make an effort to provide you with some prototype
implementation.

ciao,
    axel.

Separate from Axel's suggestions, which are good ones,
more general load-balancing is a topic we've thought about, but haven't had
a big motivation to do due to lack of problems that need it.

I've used RCB (recursive coordinate bisectioning) in
the past to good effect in other codes. Space-filling
curves are another option for irregular problems.
The downside of these approaches is the overhead they
incur for finding ghost atoms and communicating them.

I think the easiest way to do this in LAMMPS is to replace the Comm class
with an alternate version. It's something we may try out in the
next 6 months to a year due to interest from another project.

Steve