It is very important for any of these kinds of discussions, that you report which LAMMPS version you are using exactly and how it was compiled. If you capture the output from lmp -help
and report everything up to the “List of individual style options included in this LAMMPS executable”, it should contain almost all of the useful information.
Any advice that is given by any of the LAMMPS developers will usually refer to the latest (feature) release which is also what the default online documentation corresponds to. Currently, that is LAMMPS version 7 Feb 2024
The pair style hybrid documentation explicitly lists this:
Accelerator Variants: hybrid/overlay/kk
When you run with the GPU package, no special pair style is needed, but you need to keep creating the neighbor list on the CPU. LAMMPS should tell you this. This is done with: -sf gpu -pk gpu 0 neigh no
When you run with the KOKKOS package, you either need to have a GPU-aware MPI library or you need to tell LAMMPS that it is not with: -pk kokkos gpu/aware off
. The segfault you are seeing is likely a consequence of that.
This still doesn’t tell me anything about the hardware but with this kind of request you should be using a LAMMPS version that has KOKKOS support for OpenMP and GPUs (=CUDA) included and then use -k on g 4 t 8
.
With the GPU package you should have an executable that also includes the OPENMP package and then you can use -sf hybrid gpu omp -pk gpu 0 neigh no
.
In both cases, they can add OpenMP multi-threading where there is no GPU acceleration available. But for the GPU package you may also change your request to have something like
–ntasks-per-node=16 # Number of MPI ranks per node
–cpus-per-task=2 # Number of threads per MPI rank
–gres=gpu:4 # Number of requested gpus per node, can vary between 1 and 4
The GPU package is rather efficient when attaching multiple MPI tasks to the same GPU since it only offloads part of the calculation to the GPU and thus can create a higher occupancy this way and parallelize the non-GPU part better. Of course, it would be even better if there was a way to request the CUDA multi-processor server (MPS), but for using that correctly, you need to contact your local user support.
Yes, it does. I had an account for a project there recently (but not anymore and I didn’t run LAMMPS on it).