You are probably right. I ran a test case for the LJ fluid and even though
the communication time is high, adding more processors doesn't seem to
increase communication time as much as I thought it would. I use infiniband
and that really helps.
I have tried Newton off in the past for a long-chain alkane system similar
to the test case that really helped too. Another issue I have is that my
system is highly inhomogeneous (liquid-like in the middle and vapor-like on
both ends). I'm curious to see if omp will help.
so i made a quick test on my desktop and can conclude
that the /omp styles are affected by load balancing issues
even more than the domain decomposition. the current
threading scheme assumes a similar number of neighbors
per atom, and thus distributes atoms in the pair loop evenly
over the threads. in your input one can even see from the
cpu usage report in top that there are always 1-2 threads
that need much longer to finish than others, and resulting
performance is worse.
i'll keep trying with more nodes, but i suspect that the same
is going to be the case. i am very happy for your example,
as this gave me some food for thought about how
to improve the load balancing across threads, and i
may get back to you, if i find a more efficient way to
run this kind of input.
right now the only suggestion that i have is to use
the lj/cut/opt pair style instead of lj/cut to squeeze
out at least a little speedup.
sorry for not having any better answer
and thanks for the example. i very
much appreciate it.