Recently, I was optimizing balance of my simulations and I noticed that the value of %varavg reported after a run looked a little weird. Consider an output:
Loop time of 491.282 on 168 procs for 200 steps with 6008160 atoms
Performance: 0.018 ns/day, 1364.671 hours/ns, 0.407 timesteps/s
99.9% CPU use with 168 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 262.99 | 360.1 | 456.5 | 232.3 | 73.30
Bond | 5.7858e-05 | 8.6682e-05 | 0.00014435 | 0.0 | 0.00
Neigh | 4.38 | 6.9257 | 8.8372 | 43.7 | 1.41
Comm | 1.2119 | 85.953 | 191.38 | 467.9 | 17.50
Output | 11.409 | 12.668 | 21.68 | 47.4 | 2.58
Modify | 3.1396 | 16.229 | 55.382 | 353.2 | 3.30
Other | | 9.406 | | | 1.91
Manual states that
%varavg is the percentage by which the max or min varies from the average.
However, it isn’t the case for the presented output, e.g. for Pair timings, the value of %varavg is about 230%, while the difference between min time and avg time divided by the latter is ~27%.
I looked at the relevant source code (LINK) and, if I read it correctly, the actual equation for %varavg is:
where nprocs is the number of used MPI tasks and time is calculation time on a given MPI task.
It isn’t any deviation metric I know, moreover, its units are s^(1/2), while it should be unitless.
Can someone confirm this is a bug or do I just misunderstood something?
The formulation in the manual is not a good description for what this number represents.
First off, please note that “variance” != “distance”. If %varavg would be what you assume, it would be a useless measure, since you can infer that property from the min/avg/max numbers directly.
What this represents is how close on average the various values are from the average.
Consider the situation where you have a bimodal distribution (i.e. timing is either close to min or close to max, but rarely close to avg) or a narrow distribution (i.e. timing is usually close to the average, but there are a few outliers). In the first case you would get a large %varavg value in the latter you would get a small value, but min/avg/max could be the same. This is what “variance from the average” represents.
I assumed manual is precise, so I gave an upper bound for the value of %varavg. What does this number represent? It isn’t variance divided by mean. The presence of square root hints that it may be relative standard deviation, but the formula would be different.
I just gave you a description what the intention is. That should suffice. It is meant to be a useful parameter that provides information that is not available directly. Feel free to ignore it, if you don’t understand it.
It is too long ago since I did the math and implementation to recall all the details.
It may be the intention, but it isn’t what is calculated. You said that %varavg “represents is how close on average the various values are from the average”. How can such value be equal to 230%, when the values differ by 27% at most?
Setting aside formulas, variance should have units of s^2, standard deviation - s, any relative value should be unitless. Meanwhile, %varavg is in s^(1/2) which just doesn’t fit, especially that LAMMPS reports it as “% varavg”.
You are trying to argue something that is not worth arguing over, and I don’t understand why this worries you so much.
If you can come up with a better way (or better justified way or a better description) for a property equivalent to the information that this number currently provides, then feel free to submit a pull request. But mind you this has to be a property that gives the same kind of “measure” regardless of how many MPI ranks you have and how long your simulation has been running. This is currently achieved.
Anyway, I’ve actually understood what %varavg is - a square root of variance divided by a mean. And while I would say using standard deviation would be more intuitive, I agree this is a minor issue, not really worthy of spending further time on.