Single-precision KOKKOS?

Michael1 · March 11, 2020, 9:56pm

Sorry, forgot to CC the mailing list.

Thanks for the info. I was wondering, in that case, if there was any work on GPU accelerated bond styles, as there is with KOKKOS? In particular, my group (polymer physics research) uses bond_style fene almost exclusively.

Thanks again,
Michael Jacobs

akohlmey · March 11, 2020, 10:20pm

There is no value in doing this for two reasons:

there is not much computational work in bonds, so it would not run (much) faster on the GPU than on the CPU, if at all, unless you have a very complex and time consuming to compute bond potential (which FENE is not).
bonded interactions are essentially “for free” with the GPU package, since they are computed on the CPU concurrently with the pair styles (which are much, much more time consuming and with a nested loop structure instead of a single loop and thus benefit the most from GPU acceleration).

Please have a look at the “chain” log files in the bench folder. less than 10% of the total time is spent on bonded interactions, while pair style and neighbor list (which are ported to the GPU) take two thirds of the time. If you have plenty of CPU cores, you can use multiple MPI ranks per GPU and thus use MPI parallelization to speed up the non-GPU parts while at the same time increasing the GPU occupancy.

Axel.

Michael1 · March 12, 2020, 2:31pm

Thank you so much for taking the time to explain all this. It really helped.

As you’ve suggested, I’ll stick with the GPU package, since we indeed have plenty of CPU cores.

Michael Jacobs

akohlmey · March 12, 2020, 3:42pm

Two final comments:

If you are looking for an “all-GPU” code that has support for FENE bonds and is compatible with single precision, you should look into HOOMD-blue: http://glotzerlab.engin.umich.edu/hoomd-blue/

It took some inspiration from LAMMPS (so the transition is not that difficult), but was designed from ground up to run well on GPUs and uses Python as main script engine.

When running in single or mixed precision, watch out when running NPT or other variable cell methods. Unlike forces, where there is a lot of error cancellation happening, the stress tensor is under typical conditions (i.e. low pressure) much more sensitive to calculations in limited precision. Roughly speaking you should assume that for all-single precision, it is only accurate to the order of magnitude, for mixed precision to about 1.e-3 and for double precision to about 1.e-6. For fixed cell size, stress and pressure are only a diagnostic, so larger errors are acceptable, but then they impact the system evolution, care has to be taken, i.e. it may be wise to follow up single or mixed precision calculations with (short) validations in double precision (e.g. on the CPU).

HTH,
Axel.