anyone (else) needing better parallel scaling for fix rigid style integration?

hi everybody,

recently i was contacted about improving the parallel performance of
fix rigid and its siblings in the rigid package via multi-threading.
while working on this (the resulting code has just been added to
LAMMPS-ICMS, btw), it became evident, that a significant part of the
scaling problem lies in the replicated data parallelization of the
rigid bodies as such.

i'm pondering some strategies and have a couple of ideas how to make
it better, but before investing a lot of effort into this, it would be
helpful to know what are typical scenarios where people use fix rigid
& co. and how much those scenarios are impacted by the communication

please let me know and also, if you would be able to provide some test
inputs and run some tests on my behalf.

thanks in advance,

Coincidentally, I've been working on a fix rigid/small
command for the last week or so. It doesn't use
MPI_Allreduce, but communicates locally. It should
be ready for release soon, possibly today.


Just released a new fix rigid/small command today.