[lammps-users] Improve parallel effeciency

_xu · June 22, 2010, 1:52pm

Thanks for your help~
I’m using AIREBO potential, and “processors 1 1 n” command. I’m now trying to do some simple calculation before run, adjust the n value, to make sure that each processor do not have too many atoms and the n value is small enough. Now it can make some sense, though the efficiency is about 40%

It’s very interesting that “OpenMP/MPI hybrid parallelization” can have such a significant improvement, Could you explain why it works? thanks~

Best Regards

xu

2010-06-22

akohlmey · June 22, 2010, 2:15pm

Thanks for your help~
I'm using AIREBO potential, and "processors 1 1 n" command. I'm now

i thought your system is _empty_ along the z-axis.
in this case your choice is the worst possible one.
you want to have:
processors n m 1

or rather update to the latest LAMMPS patchlevel and use:
processors * * 1

trying to do some simple calculation before run, adjust the n value,
to make sure that each processor do not have too many atoms and the n
value is small enough. Now it can make some sense, though the
efficiency is about 40%

actually, in combination with what you say above,
this doesn't make so much sense to me.

It's very interesting that "OpenMP/MPI hybrid parallelization" can
have such a significant improvement, Could you explain why it works?

it is not at all surprising. with a 2x quad core node, you can
have up to 8 MPI tasks per node, but you have only one communication
link, so communication has to share bandwidth and get serialized
which increases latencies. particularly higher latencies are death
to parallel performance when getting close to scaling out.

with hybrid OpenMP/MPI you have a smaller number of MPI tasks
on each node with a comparatively larger chunk of data to work
on. particularly when using a larger number of nodes this is
reducing the communication bottleneck. for very small numbers
of processors, going all-MPI is usually more effective, since
it improves cache locality and communication is infrequent
and thus doesn't matter so much.

there are a number of factors that come into play here.
we are considering to do more detailed benchmarks on a
variety of machines and summarize our observations and
conclusions in a paper some time later this year.

if there is interest i can send a message to the list
when it is published or otherwise available.

cheers,
axel.