Recent article on improved scaling for inhomogeneous systems

Hello,

I recently came across an article in the Voth group that describes the use of Hilbert space-filling curves to obtain improved scaling of inhomogeneous systems. (http://dx.doi.org/10.1021/ct400727q). They demonstrate significant improvement relative to LAMMPS. How likely is it for an algorithm like this to make it into the LAMMPS distribution?

Dan

Hello,

I recently came across an article in the Voth group that describes the use
of Hilbert space-filling curves to obtain improved scaling of inhomogeneous
systems. (http://dx.doi.org/10.1021/ct400727q). They demonstrate
significant improvement relative to LAMMPS. How likely is it for an
algorithm like this to make it into the LAMMPS distribution?

how likely is it for you to get out your text editor and implement it yourself?

:wink:

i have not seen the article, but it is fairly easy to generate
lopsided benchmarks with LAMMPS for inhomogeneous systems.

axel.

> Hello,
>
> I recently came across an article in the Voth group that describes the
use
> of Hilbert space-filling curves to obtain improved scaling of
inhomogeneous
> systems. (http://dx.doi.org/10.1021/ct400727q). They demonstrate
> significant improvement relative to LAMMPS. How likely is it for an
> algorithm like this to make it into the LAMMPS distribution?

how likely is it for you to get out your text editor and implement it
yourself?

:wink:

There definitely is non-zero probability that I will do so. :slight_smile: However,
my motivation will hinge, in part, on the benefit that it would provide to
others in the LAMMPS community. I work with implicit solvent systems so I
cannot harness the LAMMPS parallelization as is.

i have not seen the article, but it is fairly easy to generate
lopsided benchmarks with LAMMPS for inhomogeneous systems.

I agree. Nevertheless, I am intrigued.

axel.

Thanks for the feedback,

Dan

Hi Dan, Axel,

I wrote the software in question, so I thought I should probably respond!

The sparse data approaches I used to reduce memory could probably be added to LAMMPS fairly easily, but unfortunately the load balancer is a little more involved: due to the way I use the Hilbert curve, each processor can be assigned spatial domains with irregular shapes (i.e. each processor is assigned a spatial region which is unlikely to be a “simple” parallelopiped).

This would likely require some fairly significant changes to LAMMPS’ communication routines, as one can no longer assume each spatial domain shares planar interfaces with 26 neighbour domains. This means that much of the optimizations LAMMPs uses when sharing particle data between processors no longer works as exepcted (communication via sweeps on each axis, determining which particles should be sent to what processor using cutoff planes etc).

I think it might be possible to implement the Hilbert curve load balancer as a fix in LAMMPS, but identifying/adjusting all parts of the LAMMPS code which assume the previous domain decomposition approach might be non-trivial. In particular, making the fix play nicely with isobaric ensembles etc might require some tweaks!

J.

[...]

> I recently came across an article in the Voth group that describes the
> use
> of Hilbert space-filling curves to obtain improved scaling of
> inhomogeneous
> systems. (http://dx.doi.org/10.1021/ct400727q). They demonstrate
> significant improvement relative to LAMMPS. How likely is it for an
> algorithm like this to make it into the LAMMPS distribution?

how likely is it for you to get out your text editor and implement it
yourself?

:wink:

There definitely is non-zero probability that I will do so. :slight_smile: However, my
motivation will hinge, in part, on the benefit that it would provide to
others in the LAMMPS community. I work with implicit solvent systems so I
cannot harness the LAMMPS parallelization as is.

ok. performance of implicit solvent simulations is a particular
weakness of LAMMPS.
in general, you should see some benefit using MPI+OpenMP for your
systems in combination with the existing load balancing scheme. the
existing load balancing usually works better, when the subdomains are
larger.
i also have to admit, that there have not been many optimizations for
multi-threading of bonded interactions, yet there is a significant
potential for improvement.

looking at the paper more closely, i would claim, that quite a bit of
the performance gains over LAMMPS are due to much more effective data
structures. LAMMPS uses a lot of Array-of-Arrays structures, which are
poison for good performance in C/C++ and prohibit vectorization as
well as make data access with multi-threading less efficient.

in other words, the presented CG-MD code is particularly optimized for
a use case where LAMMPS is particularly weak.

i have not seen the article, but it is fairly easy to generate
lopsided benchmarks with LAMMPS for inhomogeneous systems.

I agree. Nevertheless, I am intrigued.

difficult to say, how much can be added of these methods without
losing the generality and effectively rewriting LAMMPS from ground up.
there are people looking into different approaches already. we had
quite a bit of discussions during the recent LAMMPS workshop in ABQ
about how and how much data structures inside of LAMMPS need to be
changed to make the code ready for modern heterogeneous computing
environments. this is an ongoing discussion and a paper like the one
you pointed out, is certainly worth taking under careful
consideration.

for your specific use case, i would think that the first course of
action would be to try improving the multi-thread performance (and add
load balancing to it) and then run in hybrid MPI+OpenMP mode. that is
where i personally would expect the best "return on investment"
currently.

axel.

Just quick note:

looking at the paper more closely, i would claim, that quite a bit of
the performance gains over LAMMPS are due to much more effective data
structures. LAMMPS uses a lot of Array-of-Arrays structures, which are
poison for good performance in C/C++ and prohibit vectorization as
well as make data access with multi-threading less efficient.

The points that Axel raises here are important for high performance code in general, but less so for the comparisons in the paper: the actual pairwise interaction calculations which typically dominate MD runtimes (insert suitable caveats for long ranged electrostatics etc) are pretty comparable in both LAMMPS and the code described in the article.

Hopefully this can be seen in the timing comparisons for the simple Lennard-Jones fluids at different densities as per the article: both LAMMPS and the code I wrote are fairly close, performance-wise, when run as single-core simulations to remove the influence of the load balancing and communication routines.

The basic difference is that the usual "stacked parallelopipeds" domain decomposition is difficult to apply to the sort of systems discussed in the paper, as you're basically using a series of regular planes along each axis to divide the total system into smaller boxes. It's tricky to do that and ensure each domain has similar particle counts, while also avoiding "empty" boxes which waste CPUs.

Having said that, the LAMMPS load balancer does a very good job for many "sparse" systems, and I recommend people give it a try!

J.

Thank you for the perspective, Axel and John.

I hadn’t tried OpenMP with load balancing yet, as I am using a custom pair style. I’ll see how far that gets me and then consider my options.

Dan

Thank you for the perspective, Axel and John.

I hadn't tried OpenMP with load balancing yet, as I am using a custom pair
style. I'll see how far that gets me and then consider my options.

if it is a regular pair-wise additive non-bonded potential, it should
be very easy to add OpenMP support using the scheme in USER-OMP.

axel.