optimal CPU to couple with a GPU

Aziz_Gombi · August 31, 2018, 9:00am

Dear LAMMPS developers and users

Our research group aim to buy a supermicro having a single motherboard, a GPU (1080 Ti) and two CPUs.

Our simulating system is mainly Lenard-Jones particles whose pair styles have been developed for GPU calculations. However, we know some part of calculations, mostly Kspace, should inevitably be handled by CPU. So, we believe the selection of the best CPU would be the Bottleneck to get the fastest computing system.
Could anybody share ideas on this problem?
A list of available CPUs are as follows:
(1) Intel® Xeon® E5-2687Wv4 (3.0GHz/12-core/30MB/160W)

(2) Intel® Xeon® E5-2667v4 (3.2GHz/8-core/25MB/135W)
(3) Intel® Xeon® E5-2643v4 (3.4GHz/6-core/20MB/135W)

Could anybody say which CPU would provide the fastest calculations in conjunction with GPU?

Many thanks in advance.

Ray_Shan1 · August 31, 2018, 12:40pm

These three CPUs don’t make much difference in my opinion. It comes down to whether you want more but slower cores or less but faster cores, which should depend on your typical systems.

Ray

Aziz_Gombi · August 31, 2018, 1:08pm

Thank shan

For running a huge problem, which CPU is best to be coupled with GPU?
I mean, what’s more critical, having more cores or higher CPU frequency?

Axel may have some idea to share

Anders_Hafreager1 · August 31, 2018, 1:47pm

I did some testing on cpus only (tried a few different ones), and LAMMPS seemed to like high cpu frequency over many cores (performance per $).

Anders

Aziz_Gombi · August 31, 2018, 1:51pm

And what’s the best CPU for co-operating with GPU:

(1) Intel® Xeon® E5-2687Wv4 (3.0GHz/12-core/30MB/160W)

(2) Intel® Xeon® E5-2667v4 (3.2GHz/8-core/25MB/135W)
(3) Intel® Xeon® E5-2643v4 (3.4GHz/6-core/20MB/135W)

akohlmey · August 31, 2018, 2:03pm

If you go with a single consumer grade gpu, you are wasting money with any of those CPUs. Get a single socket quad or hex core “gaming” machine and consider a second GPU instead of a xeon dual socket system.

Aziz_Gombi · August 31, 2018, 2:08pm

Dear Axel
Thanks for your reply.
So, you suggest a Core-i, single core motherboard with two GPU? note that I use GTX.

Stefan_Paquay · August 31, 2018, 2:13pm

Just to get a general idea: I get very neat performance on a GTX 1080 Ti with a single Intel Core i7-4790, even though that CPU is getting “old” now. Said GPU right now is around $720 so you can get two of those for the price of one Xeon.

akohlmey · August 31, 2018, 2:16pm

Dear Axel
Thanks for your reply.
So, you suggest a Core-i, single core motherboard with two GPU? note that I use GTX.

a single socket will provide sufficient PCIe lanes to support two GPUs fully. doesn’t matter whether Tesla, Quadro or Geforce.
some dual socket mainboards have the same setup, i.e. only one CPU drives the PCIe bus. but then you also have many-GPU mainboards which can host up to 8 GPUs by using switches (which is not helpful, as 2 or 4 GPUs will share the bus.

beyond providing full performance PCIe-lanes and being able to physically host and support two double width cards, there is not much of a difference between CPUs, if the pair style you are using is well accelerated by the GPU package. if you want to use KOKKOS, you must not use Geforce, since KOKKOS only supports double precision at the moment.

the CPUs only matter, if a lot of computation is not on the GPU, and then i would consider dropping the GPU altogether and get 12-14 cores per socket.
when you want to simulate very large systems or do calculations with NPT, i recommend against using GeForce GPUs, as single or mixed precision introduces too large an error in my personal opinion.

axel.

Aziz_Gombi · August 31, 2018, 2:22pm

I’m going to run a large-scale problem, over 100K particles for about ~100 ns.
Also, my group aims to buy a dual socket motherboard for doing projects not supported by GPU.
Given two Xeon CPU, what’s the best (fastest) configuration? doubling number of GPUs, increasing number of cores or selecting the highest frequency? It’s really a permutation problem!!!

akohlmey · August 31, 2018, 2:31pm

I’m going to run a large-scale problem, over 100K particles for about ~100 ns.
Also, my group aims to buy a dual socket motherboard for doing projects not supported by GPU.
Given two Xeon CPU, what’s the best (fastest) configuration? doubling number of GPUs, increasing number of cores or selecting the highest frequency? It’s really a permutation problem!!!

this is impossible to predict. it depends, for example, on the operating temperature, as that influences how well you can make use of turboboost, which can be significant for the large core count xeons.

we have a bunch of Intel Xeon E5-2690v4 (14 cores,2.6GHz), which will run around 3.0GHz for LAMMPS (regular pair styles, not USER-INTEL or KOKKOS, which make little to no use of AVX), but it fluctuates a little bit depending on how hot neighboring nodes get.

i usually recommend against the kind of compromise multi-purpose one-size-fits-all machines like you seem to be looking for. you will always end up wasting money on the unused hardware parts. a single socket machine with two consumer GPUs will give you a lot of bang for rather little money. then get whatever you can with the rest of the money for the CPU-only stuff. you should at least do the math for this setup. before you decide against it. as for your specific question: my recommendation is that it doesn’t matter. it will be a bad compromise either way and it is not worth speculating over bad choices.

axel.