how to translate benchmark performance results to flops

Lammps results do not follow the standard way of reporting performance (in flops/sec).

Is there a way to translate the results for the Lennard-Jones benchmark for example, in flops/sec?

example existing output:

Performance: 17997.357 tau/day, 41.661 timesteps/s
99.4% CPU use with 8 MPI tasks x 8 OpenMP threads

Can you provide with more info on how to interpret these results and how to translate them to flops/sec?

Lammps results do not follow the standard way of reporting performance (in
flops/sec).

i strongly disagree, that FLOPS are a meaningful descriptor for the
performance of an MD code.
what matters is how quickly a defined task is done, which is what
LAMMPS reports. it would be easy to achieve a higher FLOPS rating
while at the same time have a worse actual performance. this is
particularly true for MD codes. example: when running highly threaded
and vectorized kernels, e.g. on GPUs or xeon phi accelerators, it is
more efficient to not take advantage of newton's third law and
effectively double the number of floating point operations per time
step (and thus artificially inflate the FLOP count) to reduce the
overhead of atomic operations or waiting on locks, where with serial
or minimally threaded execution, one one rather reduce the number of
operations for more efficient processing.

Is there a way to translate the results for the Lennard-Jones benchmark for
example, in flops/sec?

no. this is a non-trivial operation. the number of floating point
operations varies due to the variations of the number of neighbors.
you have a different number of floating point operations for pairs of
atoms that are within the cutoff and those outside the cutoff. on top
of that, you have floating point operations associated with other
operations, e.g. the neighbor list builds, that are difficult to
estimate or would incur unacceptable overhead if collected/computed.

example existing output:

Performance: 17997.357 tau/day, 41.661 timesteps/s
99.4% CPU use with 8 MPI tasks x 8 OpenMP threads

Can you provide with more info on how to interpret these results and how to

http://lammps.sandia.gov/doc/Section_start.html#lammps-screen-output

translate them to flops/sec?

as stated above, determining the number of FLOPS is difficult to do
unless one would accept a lot of unwanted overhead.

please also note, that FLOPS/s is redundant, as FLOPS is an
abbreviation for "floating point operations per second"; so it should
be either FLOPS or FLOP/s.

if you want to have a handle on the number of floating point (and
SSE/AVX) operations (and lots of other relevant performance metrics)
occurring during an MD run (or any executable for that matter), your
best bet are reading the performance counters embedded into your CPU.
for example using the "perf" tool
https://perf.wiki.kernel.org/index.php/Main_Page

axel.

I would go one step further and argue that FLOPS is only a sensible metric for hardware, not software. The most “performant” software in terms of FLOPS would run an infinite loop with some floating point operations inside and nothing else, but that would hardly be worth running.

The inner loop of the LJ potential does about 25 flops per

pairwise interaction. And with newton on, each IJ is

computed only once, not twice.

So with that and the number of neighbors per atom,

you can get a flop rate.

For any other pair style in LAMMPS, you would

have to hand-count the # of flops per pairwise

interaction.

Steve