Hi all, particularly the developers:
I’ve been using LAMMPS for a couple of years now, mainly using the GPU acceleration package. Our research group has its own cluster, which we built with consumer-grade, single-precision GPUs. I learned about KOKKOS and I wanted to give it a try, since it doesn’t have the back-and-forth communication every timestep like the GPU package. But, I was crushed when I found out that it only supports double-precision. Is there a hard reason for this? Or is it just no one’s asked or volunteered enough time? I’ll happily help out if help is needed.
Should I submit this on the github page as well?
Michael Jacobs
Hi all, particularly the developers:
I’ve been using LAMMPS for a couple of years now, mainly using the GPU acceleration package. Our research group has its own cluster, which we built with consumer-grade, single-precision GPUs. I learned about KOKKOS and I wanted to give it a try, since it doesn’t have the back-and-forth communication every timestep like the GPU package. But, I was crushed when I found out that it only supports double-precision. Is there a hard reason for this? Or is it just no one’s asked or volunteered enough time? I’ll happily help out if help is needed.
The (current) use of double precision stems from the design of having the same code compiled for different backends. To the best of my knowledge support for single and mixed precision is under development, but I don’t know what is the ETA on that. It will require substantial changes to the KOKKOS package in LAMMPS.
The overhead of transferring data is less severe than what you might think. Specifically on machines with plenty of CPU cores and a few GPUs, the GPU package architecture has significant advantages. Also, it is compatible with all kinds of fixes and computes, while KOKKOS requires them to be ported to KOKKOS or else the data needs to be transferred anyway. With the GPU package it often is more efficient to carefully balance the use of CPUs and GPUs (e.g. by running kspace on the CPU and tuning the coulomb cutoff for optimal performance), which can be used concurrently in the force computation. KOKKOS usually is best on top500 style machines with plenty of GPUs and for very large systems, where double precision is often a must, anyway.
HTH,
Axel.
Stan (CCd) can comment on the single-precision work for Kokkos.
Steve
Single precision work for Kokkos is underway, but as Axel said major changes to the code are required. ETA is hopefully sometime this year.