Improved performance for setup on large numbers of MPI ranks

Chris_Knight · August 21, 2017, 7:09pm

Hello,

For consideration, attached is a small package/patch COMM_NPROCS that addresses some scalability performance issues I’ve observed when setting up large molecular systems on large numbers of MPI ranks (e.g. +100Ms of particles on +10Ks MPI ranks). The suggested changes largely address logic in a few places that involves loops over processors, which become prohibitive in setting up large-scale runs. I suspect most users will see negligible performance impact from these changes, but if you regularly run large systems on +10K MPI ranks, then they might be worth taking a look at if you observe rather long setup times when replicating systems, building molecular topologies, or creating remap plans for 3D FFTs.

With these changes, I was able to successfully setup a modified rhodo replicated system (without kspace method) with 36.86 billion particles on 786,432 MPI ranks in 42 seconds compared to the original projected time of 18 hours. Again, this was just setting up the simulation in LAMMPS. With the changes below, I’m currently projecting the PPPM setup time to be ~6 minutes for the same system, compared to the original projected time of 12 days. And, in this case, the PPPM setup time would further improve if an additional hint could be passed during creation of one of the plans (the last one).

My edits and suggestions certainly don’t cover 100% of use cases in LAMMPS (everything works with rhodo), but hopefully they all can be adapted and included in the LAMMPS source and, at the very least, activated in certain cases and/or by specific request from a user with additional keywords as in my implementation. The modifications are based from a recent git pull.

Thanks for your consideration,

chris

COMM_NPROCS.tar.gz (20.2 KB)

akohlmey · August 21, 2017, 7:18pm

chris,

the way LAMMPS is set up, we cannot accept packages that overwrite base classes.

since this touches crucial core elements of LAMMPS, i suggest you work with steve plimpton directly to evaluate how to integrate such changes, either as options or entirely.
at best, i can set up a branch in the official LAMMPS repo to bootstrap testing and simplify integration, but i think it would be best to have steve (copied) review your changes first and then suggest the suitable way to move forward.

axel.

Chris_Knight · August 21, 2017, 7:28pm

Hi Axel,

That’s fine, I understand; providing the changes as a package was just intended to be a convenient mechanism for folks to look and play. Most of these changes do need careful thinking regarding their correctness for the many different ways one can run LAMMPS. If there is sufficient interest for including any of these suggestions, then I can certainly help out.

chris

sjplimp · August 23, 2017, 7:28pm

Hi Chris - these sound like useful enhancements for setting up large systems.

The speed-ups are impressive. We’ll just need to look at

them more carefully than we do options that are simply add-ons.

Might be a couple weeks before I can look at it.

Thanks,

Steve