Hello everyone,
I have implemented my own pair_style, which is non-reciprocal and long ranged. For instance: for my use case, a single particle on an average has around 80 neighbors in the high density limit.
I am looking to study systems having 1000-10000 particles. In this case, would it be advantageous to have a gpu implementation of the pair_style to improve speed? If so, can someone guide me on the typically way one would modify a pair_style to a ‘gpu accelerated’ pair style.
Thanks!
1000-10000 particles is at the lower limit of where GPU acceleration makes sense, specifically if you are looking at powerful high-end data center GPUs. At 1000 particles you already run into the strong scaling limit with a single multi-core CPU.
80 neighbors is a rather small number. Typical molecular systems tend to have 300-500 neighbors per atom.
You have two options: the GPU package and the KOKKOS package. Both differ significantly in how they operate and how they are implemented. See 7. Accelerate performance — LAMMPS documentation for some more info. The best way to learn about it is to look at some existing implementation. Just pick a pair style that is somewhat similar to yours in the flow of control and then compare the corresponding accelerated styles to the CPU version.