regarding scaling up LAMMPS on CPU/GPU

Quang_Ha · July 4, 2017, 4:37pm

Hi all,

Happy 4th July!

Anyhow, I just have some questions regarding the scaling up potential of LAMMPS. I found the benchmark documentation here: http://lammps.sandia.gov/bench.html

According to the results, it seems like LAMMPS scales better on CPU compared to GPU. Is this always the case, like do we always expect LAMMPS to perform better on, say, KNL when compared to, say, Titan X? Reasons for asking simply because I need to specify the nodes for the supercomputer I am applying to use, so it would be better to use the one which I can the most efficient resutls out of.

Or have my interpretation of the benchmark been wrong? Would love to hear some opinions, please.

Many thanks,
Quang

akohlmey · July 5, 2017, 11:54am

Hi all,

Happy 4th July!

Anyhow, I just have some questions regarding the scaling up potential of
LAMMPS. I found the benchmark documentation here:
http://lammps.sandia.gov/bench.html

According to the results, it seems like LAMMPS scales better on CPU compared
to GPU. Is this always the case, like do we always expect LAMMPS to perform
better on, say, KNL when compared to, say, Titan X?

scaling != performance.
also, you didn't mention whether you were looking at "strong scaling"
(i.e. same system size regardless of number of nodes) or "weak
scaling" (i.e. same system size per node).

parallel scaling primarily on two factors: the amount of extra work
required when running in parallel, and the overhead caused by
communication.
when you increase your per node performance (e.g. by adding one or
more GPUs), then your communication overhead will show more
drastically. similarly, with GPUs the optimal utilization requires a
much larger number of particles per node, thus for "strong scaling"
tests, you will see a drop in performance (and thus scaling) once you
drop below that number.
there are plenty of cases, where you can get the best absolute
performance (i.e. the performance when the application scales out)
with CPUs, yet that requires 3-10x as many nodes, as with nodes
containing accelerators. or looking at it the other way around, the
biggest impact of GPUs is usually with a small to moderate number of
nodes.

Reasons for asking
simply because I need to specify the nodes for the supercomputer I am
applying to use, so it would be better to use the one which I can the most
efficient resutls out of.

the numbers on the LAMMPS webpage are at best a guideline for what to
expect. you cannot derive actual performance information from it,
because many factors, including the details of your input file,
determine performance. the benchmark numbers are usually for the
"pure" MD simulation, without any analysis computes, dumps for output
and typically for systems with good load balance. most real-world
simulations with diverge from that. the only way to find out for
certain is to run benchmarks of your own with a representative input
of a representative system at the suitable size and on the machine
where you plan to run.

axel.

Titusi_Forum · July 21, 2017, 2:57am

You could try running your LAMMPS simulations on kogence.com.
You can quickly try both CPUs and GPUs machine and see how performance would change in your application.