timing-breakdown and acceleration

Prithwish_Nandi2 · January 30, 2020, 2:42pm

Hi,
I have another question regarding the run provided in ‘bench’ directory in LAMMPS distribution. The output says that the ‘Neigh’ sections consumes about 47% of total timing and the ‘Pair’ part consumes nearly 20% of the total.

To improve the ‘pair’ timing, I may use more processes (PI or OpenMP).
But, for this problem, more lucrative option is trying to reduce the time spent on the ‘Neigh’ section.
How can I do that?
I tried to use more processes for this but it remains same.
Any suggestions?

Thanks,
PKN\

akohlmey · January 30, 2020, 3:30pm

Hi,
I have another question regarding the run provided in ‘bench’ directory in LAMMPS distribution. The output says that the ‘Neigh’ sections consumes about 47% of total timing and the ‘Pair’ part consumes nearly 20% of the total.

Are you sure you are not misreading the output?
This kind of timing breakdown would indicate some problem in your input deck, or a very, very unusual system geometry.

To improve the ‘pair’ timing, I may use more processes (PI or OpenMP).
But, for this problem, more lucrative option is trying to reduce the time spent on the ‘Neigh’ section.
How can I do that?
I tried to use more processes for this but it remains same.
Any suggestions?

I would say you are looking at the wrong “knobs”. I would first run with just one 1 MPI rank and no threads and find a way to optimally balance between Pair, Neigh and the rest.

Only then, it is worth looking at parallelization issues. Please see my other response on that.

Axel.

Prithwish_Nandi2 · January 30, 2020, 3:39pm

Dear Axel,
Thanks for your reply.
Actually, I have not run it.
I am just quoting from the log file (log.6Oct16.chain.fixed.icc.1) distributed with the LAMMPS(2019) distribution.
The result is for 1 MPI rank (given below)

The output is a bit confusing for me too.
Only 1 thing that came to my mind is that the neighbour list is being modified every step.

neighbor 0.4 bin
neigh_modify every 1 delay 1

Thanks in advance for any suggestion.
Best,
PKN\

LOG OUTPUT

akohlmey · January 30, 2020, 4:21pm

Dear Axel,
Thanks for your reply.
Actually, I have not run it.
I am just quoting from the log file (log.6Oct16.chain.fixed.icc.1) distributed with the LAMMPS(2019) distribution.
The result is for 1 MPI rank (given below)

The output is a bit confusing for me too.
Only 1 thing that came to my mind is that the neighbour list is being modified every step.

no, it is not. every 1 delay 1 (and implicitly check yes).
is resulting in having the neighbor list recomputed every other step, but checks for how far atoms have traveled and will then check at every step and only if at least one atom has traveled more than half the skin distance, the neighbor list is rebuilt. the total number of neighborlist rebuilds is 25 for 100 MD steps, so one every four steps.

This is a system with very low density and a very short cutoff (1.12 sigma) resulting in on average less than 5 neighbors per atoms and thus spending very little time on computing non-bonded forces.

this is a bit of an extreme system, which is why it is included in the benchmarks folder as a contrast to the in.lj test, which is the extreme opposite (and has only 10% time spent in building neighbor lists, in part because they need to rebuilt less frequently for a denser system).

axel.