Dear lammps user,
We want to purchase a high performance computing cluster for running lammps
code with GPUs. Our simulation system consists of nearly 70000 atoms. Can
some one shed the light on what are the latest processors siuts lammps and
the optimal confuguration oh HPC with GPUs with in $20000 USD.
as you may guess, people like me get asked questions yours a *LOT*.
and it is quite difficult to give good advice, since good advice needs
to be tailored specifically to your particular needs, local expertise
and environment. it needs doing research and spending time and effort
checking on hardware developments. while there are no really big
revolutionary changes, the hardware market has frequent incremental
changes at a pace, that makes a recommended configuration near
obsolete by the time you make the recommendation. especially for
people on a small budget, there is also the risk factor. with little
money, you cannot afford to take risks, or you lose it all. which
means, you need tried and tested hardware configurations and must to
jump on the "latest greates" bandwagon, that vendors keep pushing
forward all the time.
in short, nobody can afford to give you a recommendation for an
_optimal_ configuration. those need to be custom tailored for your
needs, and - like a custom tailored suit or dress - it is expensive
and time consuming to do. remember, each time you add a degree of
freedom (CPU, RAM, network, GPU, storage), the number of possible
permutations grows larger, and you cannot optimize each dimension
thus some general comments and questions on this, that will hopefully
help you find a suitable answer yourself.
- a budget of $20,000 is *very* small for building an HPC cluster. the
basic infrastructure for an HPC cluster, i.e. the hardware you need,
but that you don't run calculations on can easily consume up to half
of that. are you willing to give away that much money up front?
- systems of 70,000 are not very large. so you won't get much speedup
across multiple nodes unless you have a very expensive high-speed
interconnect, which doesn't make much sense for a very small budget.
- HPC clusters will need to be configured, housed in suitable
facilities (racks, power, cooling), run by people with proper
knowledge. do you have people with those skills on hand?
- how about your local experience with GPUs? did you do some
benchmarks? do the potentials you want to run properly support GPU
acceleration? vendors usually only support configurations with (very
expensive) dedicated "HPC GPUs", e.g. nividia tesla. running with
consumer grade GPUs is possible, but requires suitable cases and power
supplies. also you need expertise to spec, configure, set up and
operate those correctly. you also need to have a contingency plan to
deal with hardware failures, as your warranty is typically more
limited on consumer grade hardware. and some of it is not designed to
be run under full load 24/7.
- consumer grade GPUs require use of single or mixed precision
floating point math or else your acceleration will be very limited.
this means that some operations have a larger error than when running
with CPUs. this becomes particularly noticeable when computing
stresses, which are very sensitive to floating point accuracy. so if
you need to do a lot of pressure/stress computation or run frequently
with fix npt or fix press/berendsen, using GPUs in single or mixed
precision may prove more troublesome than running on CPUs.
- keep in mind that accelerator devices will usually have to be shared
across multiple CPU cores, which will limit the available acceleration
capacity per CPU core. e.g. with 2 high-end GPUs for a 20 core node,
you may get less than a 2x speedup from the GPUs compared to running
on the CPU cores only.
- when getting a quote for a cluster, also get quotes for alternative
approaches (possibly also from multiple vendors): check out getting
several (dual-socket) workstations instead (with and without GPUs) and
also check how many consumer grade "gaming" PCs you can get. and
compare. keep in mind that consumer grade hardware typically needs
significantly more maintenance effort and has more hardware failures.
so you must not only compare capability of the hardware, but also
resilience, potential downtimes and maintenance effort
- keep in mind that vendors have a preference to push the next
generation hardware (even if there is no benefit for you), and push
the most extreme configurations (highest clock, most cores, cheapest
components). the optimum is usually at a point where things are well
balanced. unfortunately, the extremely confusing and huge number of
hardware variants makes finding a good combination much harder than
easier. add to that, that some manufacturers arbitrarily cripple
certain hardware in order to make more extreme hardware with
higher-margins more attractive, and you'll see why finding an optimal
hardware configuration is such a large effort.