the attached file solves the problem with the 0 starting to appear. The issue is a missing update of a device pointer (for the arrays containing the per atom energy and virial) after nmax (the maximal number of total atoms of a process) was increased during a reneighboring step in which the energy of the system was not calculated.
I was not able to reproduce the differing values between pe and c_pe though.
Is that issue reproducable on your machine?
cuda.cpp (31.5 KB)