lammps performance on CUDA - single v/s double precision

Sangamesh_Banappa · May 10, 2013, 10:20am

Dear lammps users,

Following are the numbers that we got with lammps single & double precision benchmarks. The input file used is in.lj.cuda. (taken from the lammps distribution - bench/GPU folder)

#AtomSize Steps single_looptime double_looptime
256 100 0.10108 0.102101
2048 100 0.111682 0.110275
16384 100 0.139425 0.144047
131072 100 0.393052 0.387888

256 1000 0.899185 0.906075
2048 1000 1.02966 1.03671
16384 1000 1.3074 1.31008
131072 1000 3.7608 3.73454

256 10000 9.04168 8.99344
2048 10000 10.2678 10.2985
16384 10000 12.9105 13.1372
131072 10000 38.6626 39.0364

We tried to compare these results with results published on the lammps website:

http://lammps.sandia.gov/bench/gpu.desktop.lj.single.jpg
http://lammps.sandia.gov/bench/gpu.desktop.lj.double.jpg

The graphs show that there is a small difference between the single & double precision results. For example the peak performance for single is
40 & for double it is 25 (approx) millions of atom timesteps per second. But in our results these are almost same. How this could be possible?
We’re sure that, the compilation of both single & double is correct.

Also it is confusing to get the values from the above graphs. It is easy to compare, if results are available in tabular format. Is that available in some link?

Some details about our setup:

LAMMPS version: 30-Aug-2012
GPU Card : nVidia Tesla M2090
CUDA version : 4.0

Thanks

sge_sub.sh (304 Bytes)

in.lj.cuda (455 Bytes)

lammps_cuda_single_double (436 Bytes)

sjplimp · May 10, 2013, 2:37pm

Mike can possibly comment. It doesn’t look you
ran the same case twice to me.

Steve

_Brown_W_Michael · May 10, 2013, 2:48pm

On titan, with tesla k20x, I do see that single and double are more similar for some simulations because the cast time for data transfer becomes significant.

For your setup, I do not think that this is expected, however. Can you send the screen output for both for just a single run, e.g. 131072, 100 steps? The –screen commandline option can be used to send screen to a file. Thanks. - Mike

sjplimp · May 10, 2013, 3:06pm

Sorry - that was a typo. I meant to say,
it looks like you ran the same case twice.
I.e. you did not run single vs double precision.

Steve

Sangamesh_Banappa · May 12, 2013, 8:41am

On titan, with tesla k20x, I do see that single and double are more similar for some simulations because the cast time for data transfer becomes significant.

For your setup, I do not think that this is expected, however. Can you send the screen output for both for just a single run, e.g. 131072, 100 steps? The –screen commandline option can be used to send screen to a file. Thanks. - Mike

lammps_single_32_100 (1.11 KB)

lammps_double_32_100 (1.11 KB)

32_100st_single_double.tar.gz (1.96 KB)

_Brown_W_Michael · May 12, 2013, 3:42pm

Sorry, I thought you were using the GPU package.

Maybe Christian is a better contact for help with the USER-CUDA package. - Mike

sjplimp · May 13, 2013, 2:17pm

ok - maybe Christian can take a look, since this is USER-CUDA.

Steve