FYI, you will get better performance on the GPU if you can use 12 or less atom types with many Kokkos styles. If atom types <= 12 then it will use stack memory for the i-j parameters, otherwise it will use global memory which is slower in general.
Stan