Optimizing pair style with GPU

Hello everyone,
I have implemented my own pair_style, which is non-reciprocal and long ranged. For instance: for my use case, a single particle on an average has around 80 neighbors in the high density limit.

I am looking to study systems having 1000-10000 particles. In this case, would it be advantageous to have a gpu implementation of the pair_style to improve speed? If so, can someone guide me on the typically way one would modify a pair_style to a ‘gpu accelerated’ pair style.

Thanks!

1000-10000 particles is at the lower limit of where GPU acceleration makes sense, specifically if you are looking at powerful high-end data center GPUs. At 1000 particles you already run into the strong scaling limit with a single multi-core CPU.

80 neighbors is a rather small number. Typical molecular systems tend to have 300-500 neighbors per atom.

You have two options: the GPU package and the KOKKOS package. Both differ significantly in how they operate and how they are implemented. See 7. Accelerate performance — LAMMPS documentation for some more info. The best way to learn about it is to look at some existing implementation. Just pick a pair style that is somewhat similar to yours in the flow of control and then compare the corresponding accelerated styles to the CPU version.