LAMPS: data distribution and work distribution

Dear Lamps team,

I'm not a user of Lamps but we used it for few experiments with
checkpointing. I had a look at the paper which is from 1995. Although
nice and detailed probably is dated. I also searched in the mailing list
and the last information on load balancing appear to date to year 2012.
I am interested in the following information about Lamps since I'm
writing part of a publication which will involve few lines about LAMPS
and I would like to get the right info directly from the people directly
involved:

- how Lamps distribute the data to be processed among processors?
- how Lamps distribute the work among the processors involved in the
calculation?

We used a version of the software which corresponds to git clone of the
repository around 10 January 2015.

If you don't want/can provide an answer, please point me to some place
where this info is easily explained.

I saw the following:
http://lammps.sandia.gov/doc/fix_balance.html

I would like to know if this contains the most recent info on this.

Thank you very much in advance for your answer and for your time.

Best Regards,

Federico

Dear Lamps team,

I'm not a user of Lamps but we used it for few experiments with
checkpointing. I had a look at the paper which is from 1995. Although
nice and detailed probably is dated. I also searched in the mailing list

for the most part that 1995 paper is still quite accurate.

and the last information on load balancing appear to date to year 2012.

this load balancing is an option and primarily used for special cases,
where the default scheme is inefficient *and* users care (not all of
them do).

I am interested in the following information about Lamps since I'm
writing part of a publication which will involve few lines about LAMPS
and I would like to get the right info directly from the people directly
involved:

- how Lamps distribute the data to be processed among processors?

by default it uses a spatial decomposition with equal size domains as
described in the 1995 paper ("brick" communication style). the balance
command can augment this distribution based on particle counts through
growing or shrinking the grid spacing. changing the communication
style to "tiled" will use recursive bisectioning (again based on
particle counts).

via the "processors" command, the processor grid dimensions can be
influenced. by default the domains are chosen to have an optimal
surface to volume ratio, since the communication effort scales with
the subdomain surface, while the computation effort scales with the
volume.

- how Lamps distribute the work among the processors involved in the
calculation?

it is generally assumed that the amount of work is proportional to the
number of particles and for homogeneous systems (a common use case)
that is proportional to the volume.

We used a version of the software which corresponds to git clone of the
repository around 10 January 2015.

If you don't want/can provide an answer, please point me to some place
where this info is easily explained.

I saw the following:
http://lammps.sandia.gov/doc/fix_balance.html

I would like to know if this contains the most recent info on this.

it is an optional feature and only a subset of LAMMPS users enable it.

Axel’s is a good summary. I’ll just add that

all of the spatial decompositions (equal-size like

in 1995) paper or the load-balancing options (RCB, etc)

keep the domain owned by each MPI process to

be a brick-shaped box. They only differ in the

communication pattern to the neighboring bricks (procs).

Which is what CommBrick and CommTiled take care of.

And that is the outer level of parallism. Several of

the acceleration packages exploit intra-node parallelism

within one MPI task. E.g. the GPU, USER-CUDA, USER-OMP,

USER-INTEL, KOKKOS packages.

Steve

Dear Axel and Steve,

thank you very much both for the fast and clear answers!

Federico