I modified that preset my-kokkos-sycl-intel.cmake to include the settings that generate AOT kernel compiles. I also turned off OpenMP because it’s nothing but trouble.
It turns out that cmake file I uploaded is no good. The hardcoded values for MPI are not correct, so no parallel is possible. Annoyingly, instead of complaining about it, cmake just defaulted to the builtin MPI stubs. After I commented out the line for MPI executable and instead turned on BUILD_MPI to encourage cmake to find MPI on its own, cmake was actually able to find and use it.
In addition to the variables @stamoor suggested (Jan 4), I also needed to do unset ZE_AFFINITY_MASK. For some reason, the latest Intel MPI refuses to do gpu/aware when that variable is set.
After rebuilding and setting all these annoying variables, I can now fully parallelize across multiple PVC GPUs. Even the kspace for Rhodopsin benchmark is working correctly. This is very exciting indeed.
I put my build in a container. If someone you know has PVCs and would like to try my build, I can share it.