I would like to run LJ simulations with a very large cutoff (equal to half the box length for quite large simulations). I am studying finite-size properties near the critical point and I need to capture all possible fluctuations in the simulation cell. However this greatly decreases parallel efficiency as it increases the communication time between processors. Is there a way to use the USER-OMP package to increase efficiency, or is the only solution brute force?

I would like to run LJ simulations with a very large cutoff (equal to half
the box length for quite large simulations). I am studying finite-size
properties near the critical point and I need to capture all possible
fluctuations in the simulation cell. However this greatly decreases
parallel efficiency as it increases the communication time between
processors. Is there a way to use the USER-OMP package to increase
efficiency, or is the only solution brute force?

stan are you sure this is due to communication issues?

...and not due to the fact that the computational effort
increases O(N**6) with the cutoff (you have N*(N-1)
pair interactions to compute and the number of N
increases O(N**3) with the cutoff).

please provide a representative input that exhibits
the issue and i am happy to run a few quick benchmarks
to give you an appreciation of what the performance
will be like for different combinations of MPI and OpenMP
for different cutoffs.

I would like to run LJ simulations with a very large cutoff (equal to half
the box length for quite large simulations). I am studying finite-size
properties near the critical point and I need to capture all possible
fluctuations in the simulation cell. However this greatly decreases
parallel efficiency as it increases the communication time between
processors. Is there a way to use the USER-OMP package to increase
efficiency, or is the only solution brute force?

after some thinking, you may be able to avoid
some communication at the expense of more
computation by using

newton off

using /omp pair styles may help in that case, too.
at least in theory, since they allow to have a more
favorable ratio of owned to ghost atoms (which need
communication with "newton on") through larger
domains. but its parallel efficiency is not always
as good for simple, dense systems. you certainly
want to try my latest development version over
the one that is bundled with the current lammps
release. since that has much improved parallel
efficiency for /omp code.

nah - for fixed N = # of atoms,
the computation cost scales as O(R^3) with the cutoff distance R,
but that's bad enough. If the ratio of communication
to computation is not going up with the cutoff, there's
not probably much you can do about it.