Issue with the value of "%varavg"


Recently, I was optimizing balance of my simulations and I noticed that the value of %varavg reported after a run looked a little weird. Consider an output:

Loop time of 491.282 on 168 procs for 200 steps with 6008160 atoms

Performance: 0.018 ns/day, 1364.671 hours/ns, 0.407 timesteps/s
99.9% CPU use with 168 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
Pair    | 262.99     | 360.1      | 456.5      | 232.3 | 73.30
Bond    | 5.7858e-05 | 8.6682e-05 | 0.00014435 |   0.0 |  0.00
Neigh   | 4.38       | 6.9257     | 8.8372     |  43.7 |  1.41
Comm    | 1.2119     | 85.953     | 191.38     | 467.9 | 17.50
Output  | 11.409     | 12.668     | 21.68      |  47.4 |  2.58
Modify  | 3.1396     | 16.229     | 55.382     | 353.2 |  3.30
Other   |            | 9.406      |            |       |  1.91

Manual states that

%varavg is the percentage by which the max or min varies from the average.

However, it isn’t the case for the presented output, e.g. for Pair timings, the value of %varavg is about 230%, while the difference between min time and avg time divided by the latter is ~27%.

I looked at the relevant source code (LINK) and, if I read it correctly, the actual equation for %varavg is:

where nprocs is the number of used MPI tasks and time is calculation time on a given MPI task.

It isn’t any deviation metric I know, moreover, its units are s^(1/2), while it should be unitless.
Can someone confirm this is a bug or do I just misunderstood something?

The formulation in the manual is not a good description for what this number represents.
First off, please note that “variance” != “distance”. If %varavg would be what you assume, it would be a useless measure, since you can infer that property from the min/avg/max numbers directly.
What this represents is how close on average the various values are from the average.
Consider the situation where you have a bimodal distribution (i.e. timing is either close to min or close to max, but rarely close to avg) or a narrow distribution (i.e. timing is usually close to the average, but there are a few outliers). In the first case you would get a large %varavg value in the latter you would get a small value, but min/avg/max could be the same. This is what “variance from the average” represents.

I assumed manual is precise, so I gave an upper bound for the value of %varavg. What does this number represent? It isn’t variance divided by mean. The presence of square root hints that it may be relative standard deviation, but the formula would be different.

I just gave you a description what the intention is. That should suffice. It is meant to be a useful parameter that provides information that is not available directly. Feel free to ignore it, if you don’t understand it.

It is too long ago since I did the math and implementation to recall all the details.

It may be the intention, but it isn’t what is calculated. You said that %varavg “represents is how close on average the various values are from the average”. How can such value be equal to 230%, when the values differ by 27% at most?

Setting aside formulas, variance should have units of s^2, standard deviation - s, any relative value should be unitless. Meanwhile, %varavg is in s^(1/2) which just doesn’t fit, especially that LAMMPS reports it as “% varavg”.

You are trying to argue something that is not worth arguing over, and I don’t understand why this worries you so much.

If you can come up with a better way (or better justified way or a better description) for a property equivalent to the information that this number currently provides, then feel free to submit a pull request. But mind you this has to be a property that gives the same kind of “measure” regardless of how many MPI ranks you have and how long your simulation has been running. This is currently achieved.

It’s a character flaw. :wink:

Anyway, I’ve actually understood what %varavg is - a square root of variance divided by a mean. And while I would say using standard deviation would be more intuitive, I agree this is a minor issue, not really worthy of spending further time on.