Which GPU to buy?

Hello,

We’re interested in using LAMMPS with GPU, and we’re not sure about cost/speed ratios. We don’t have any GPU experience but based on what we’ve read on the internet we narrowed it down to:

  • Tesla GPU
  • GeForce GTX 680 or 670

(https://developer.nvidia.com/cuda-gpus)

GeForce 680 is 1.5x the price of 670
Tesla is 10x the price of 670

From what we’ve read, Tesla is not going to result in a significant speed-up over that offered by GeForce for our standard LAMMPS installation. Can anyone confirm this? What, if any, speed-up we could expect?

Finally, I imagine that we won’t get 1.5x the speed by paying 1.5x the price for a 680 instead of a 670?

Thank you for your advice.

Best regards,

James

Hello,

We're interested in using LAMMPS with GPU, and we're not sure about
cost/speed ratios.

We don't have any GPU experience but based on what we've read on the
internet we narrowed it down to:
- Tesla GPU

which tesla? fermi based or kepler based?

- GeForce GTX 680 or 670

this is an ill-posed consideration. for GPU computing
the choice/configuration of the host system is almost
as important than the choice of GPU itself. for any
newer GPU you want a system with a suitable PCI-e 3.x
slot and a matching card. you need to push a lot of
data to the GPU and back, so you want as much
bandwidth as you can get (which makes the GTX 690
not such a good choice).

(https://developer.nvidia.com/cuda-gpus)

GeForce 680 is 1.5x the price of 670
Tesla is 10x the price of 670

From what we've read, Tesla is not going to result in a significant
speed-up over that offered by GeForce for our standard

the tesla vs. geforce speedup depends a lot on which tesla
would get and how you compile LAMMPS. if you want to go
all double precision, there is more of a benefit to the tesla
than for mixed or single precision. also, the new "kepler"
based tesla models offer some features that should make
lammps perform much better when using multiple MPI tasks
with the same GPU, which is not enabled on the geforce.

but you are not going to get a speedup equivalent
to the cost difference.

LAMMPS installation. Can anyone confirm this? What, if any, speed-up we
could expect?

the speedup is impossible to predict. it depends far
too much on the calculations you want to do (and
features you need). simple potentials (like lennard-jones)
with short cutoffs offer less potential for speedup
than computationally intense potentials like gay-berne
for ellipsoids or manybody potentials. it also depends
on the ratio of CPU cores to GPUs and the host systems
overall performance. people have seen speedups of
15x-20x to slowdowns.

please note that most "promotional" material on GPUs
(and that includes many papers) the speedup is given
as a speedup over a single CPU core.

Finally, I imagine that we won't get 1.5x the speed by paying 1.5x the
price for a 680 instead of a 670?

you never get this. same as for CPUs.
if you really want the absolute best
price performance ratio you'd get a
raspberry pi board and no GPU.
can't beat that price.

axel.

Dear Axel,

Thanks for your insight and explanations. It’s very helpful.

One point I still don’t understand: the number of cores for the Tesla GPU’s is relatively low compares to GeForce, and the memory bandwidth and FLOPS are similar (compare [1] and [2], for example). Based on these metrics alone, one would imagine the GeForce card to be faster. What is it about Tesla-based GPU’s therefore that make them faster than GeForce (specially for single precision, since I understand double precision is not possible with the latter)?

Thank you again for your time. I really appreciate it.

Best regards,
James

ps, If I get a chance to build a Raspberry Pi LAMMPS cluster, I’ll be sure to let you know the performance!

[1] http://www.nvidia.com/object/tesla-servers.html
[2] http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-680/specifications

Dear Axel,

Thanks for your insight and explanations. It's very helpful.

One point I still don't understand: the number of cores for the Tesla
GPU's is relatively low compares to GeForce, and the memory bandwidth and
FLOPS are similar (compare [1] and [2], for example). Based on these
metrics alone, one would imagine the GeForce card to be faster. What is it
about Tesla-based GPU's therefore that make them faster than GeForce

who says that Tesla GPUs are faster than (high-end) GeForce?

for both the "G200" and the "Fermi" generation of nvidia GPUs,
the fastest single GPU GeForce card (GeForce 285 and GeForce 480)
significantly outperformed the corresponding Tesla Model (C1060, C2050)
when running MD in single or mixed precision. even for double precision,
the GeForce cards tend to do fairly well, since a lot of the performance
in MD, is not due to the pure floating point performance, but also
due to memory bandwidth, so the faster memory can compensate
for lack of double precision capable floating point units to some degree,
specifically for potentials, that don't require a lot of arithmetic.

(specially for single precision, since I understand double precision is not

possible with the latter)?

this is not correct. GeForce GPUs can do double precision, too.
the K10 card is described as the same architecture than what sits in
the GeForce GTX 680, i.e. GK104 with the same number of cores and
otherwise looks like a GeForce GTX 690 with more memory and some
gimmicks like ECC enabled.

according to data available on wikipedia its single vs. double precision
capability appear to be about the same, too. with the GTX 680 being
clocked higher. all the special features that nvidia has announced and
that help with HPC (e.g. get the extra performance boost with lammps
reported on ORNL's new Titan machine) are apparently reserved for the
K20 tesla GPU (with GK110 architecture).

when comparing prices between GeForce and Tesla, you have to
take a number of issues under consideration:
- tesla are built for reliability and being fully loaded 24/7
  this means: more testing, more expensive components,
  more "manual" work (human workforce is what makes
  things expensive).
- tesla cards have usually much more RAM
- tesla cards have an extended warranty through nvidia
- tesla cards have better management and monitoring capability
  (helps a lot when you are deploying a lot of them)
- tesla cards have (some) gimmicks enabled, that GeForce cards dont.

with geforce cards vendors have much more limited warranties and
often operate with the principle that they produce cheaper and risk
more failures and then replace what is broken when it is broken, if
the user notices at all. many (memory) errors that are detected in
GPU computing will never show with regular use as a graphics card
(or you may not notice if some pixels are colored wrong).
thus geforce cards can be produced significantly cheaper and you
as a user go a higher risk and have more manual work. ultimately,
the situation is similar to having nvidia quadro GPUs, which are also
practically the same GPU chips, but with other feature sets enabled,
that are not available in GeForce (and tweaks and optimizations for
Stereo, CAD and related operations that don't matter in games).

in general, you always have to distinguish between an
effective sales pitch and real performance data. in the
IT business (and that applies to the entire supply chain
from hardware to software vendors, from companies that
produce components to system integrators) it is common
practice to deceive their customers by carefully omitting
data that would reveal the shortcomings of a specific solution.

axel.