Kokkos memory allocation error using PPPM

Hello everyone,

I’ve been struggling with this error for the past week, and I can’t seem to find my way around the issue.

I am trying to adapt my input to run on GPU through the Kokkos package, but whenever I change the kspace_style to pppm instead of ewald, I instantly get a memory allocation error. I am using Dreiding parameters and currently running the stable version of LAMMPS.

My input script is as simple as it gets, straight from the LAMMPS-Interface:

log             log.COF-42 append
units           real
atom_style      full
boundary        p p p

pair_style      lj/cut/coul/long 12.500
bond_style      harmonic
angle_style     cosine/squared
dihedral_style  charmm
improper_style  umbrella
kspace_style    pppm 1.0e-6

special_bonds   dreiding
dielectric      1.0
pair_modify     tail yes mix arithmetic
box tilt        large
read_data       data.COF-42

#### Atom Groupings ####
group           fram     id   1:924
#### END Atom Groupings ####

minimize        1.0e-4 1.0e-6 1000 10000

And get the following error all the time:

LAMMPS (23 Jun 2022 - Update 1)
KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:105)
  will use up to 2 GPU(s) per node
WARNING: Turning off GPU-aware MPI since it is not detected, use '-pk kokkos gpu/aware on' to override (src/KOKKOS/kokkos.cpp:293)
  using 16 OpenMP thread(s) per MPI task
Reading data file ...
  triclinic box = (0 0 0) to (29.9768 25.96067 28.35) with tilt (-14.9884 0 0)
  1 by 1 by 2 MPI processor grid
  reading atoms ...
  924 atoms
  scanning bonds ...
  3 = max bonds/atom
  scanning angles ...
  6 = max angles/atom
  scanning dihedrals ...
  12 = max dihedrals/atom
  scanning impropers ...
  3 = max impropers/atom
  reading bonds ...
  966 bonds
  reading angles ...
  1596 angles
  reading dihedrals ...
  2100 dihedrals
  reading impropers ...
  1008 impropers
Finding 1-2 1-3 1-4 neighbors ...
  special bond factors lj:    0        0        1
  special bond factors coul:  0        0        1
     4 = max # of 1-2 neighbors
     6 = max # of 1-3 neighbors
     9 = max # of special neighbors
  special bonds CPU = 0.001 seconds
  read_data CPU = 0.043 seconds
924 atoms in group fram
WARNING: Using 'neigh_modify every 1 delay 0 check yes' setting during minimization (src/min.cpp:187)
PPPM initialization ...
  using 12-bit tables for long-range coulomb (src/kspace.cpp:342)
  G vector (1/distance) = 0.25582581
  grid = 24 32 24
  stencil order = 5
  estimated absolute RMS force accuracy = 0.00020091006
  estimated relative force accuracy = 6.0503468e-07
  using double precision FFTW3
  3d grid and FFT values/proc = 343 9216
terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos failed to allocate memory for label "GridComm:swap_packlist".  Allocation using MemorySpace named "Cuda" failed with the following error:  Allocation of size 1.718e+10 G failed, likely due to insufficient memory.  (The allocation mechanism was cudaMalloc().  The Cuda allocation returned the error code ""cudaErrorMemoryAllocation".)

  what():  Kokkos failed to allocate memory for label "GridComm:swap_packlist".  Allocation using MemorySpace named "Cuda" failed with the following error:  Allocation of size 1.718e+10 G failed, likely due to insufficient memory.  (The allocation mechanism was cudaMalloc().  The Cuda allocation returned the error code ""cudaErrorMemoryAllocation".)

[vision1:3276138] *** Process received signal ***
[vision1:3276139] *** Process received signal ***
[vision1:3276138] Signal: Aborted (6)
[vision1:3276138] Signal code:  (-6)
[vision1:3276139] Signal: Aborted (6)
[vision1:3276139] Signal code:  (-6)
[vision1:3276138] [ 0] [vision1:3276139] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7f6c1b4463c0]
[vision1:3276139] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7f821e1f43c0]
[vision1:3276138] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f821db7503b]
[vision1:3276138] [ 2] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f6c1adc703b]
[vision1:3276139] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f6c1ada6859]
[vision1:3276139] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f821db54859]
[vision1:3276138] [ 3] /opt/software/software/GCCcore/10.3.0/lib64/libstdc++.so.6(+0xaddcc)[0x7f6c1b1cddcc]
[vision1:3276139] [ 4] /opt/software/software/GCCcore/10.3.0/lib64/libstdc++.so.6(+0xaddcc)[0x7f821df7bdcc]
[vision1:3276138] [ 4] /opt/software/software/GCCcore/10.3.0/lib64/libstdc++.so.6(+0xb8e36)[0x7f6c1b1d8e36]
[vision1:3276139] [ 5] /opt/software/software/GCCcore/10.3.0/lib64/libstdc++.so.6(+0xb8e36)[0x7f821df86e36]
[vision1:3276138] [ 5] /opt/software/software/GCCcore/10.3.0/lib64/libstdc++.so.6(+0xb8ea1)[0x7f821df86ea1]
[vision1:3276138] [ 6] /opt/software/software/GCCcore/10.3.0/lib64/libstdc++.so.6(+0xb8ea1)[0x7f6c1b1d8ea1]
[vision1:3276139] [ 6] /opt/software/software/GCCcore/10.3.0/lib64/libstdc++.so.6(+0xb9134)[0x7f821df87134]
[vision1:3276138] [ 7] /opt/software/software/GCCcore/10.3.0/lib64/libstdc++.so.6(+0xb9134)[0x7f6c1b1d9134]
[vision1:3276139] [ 7] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4Impl23throw_runtime_exceptionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x35)[0x7f6c1c5d3ef7]
[vision1:3276139] [ 8] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4Impl23throw_runtime_exceptionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x35)[0x7f821f381ef7]
[vision1:3276138] [ 8] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4Impl41safe_throw_allocation_with_header_failureERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_RKNS_12Experimental26RawMemoryAllocationFailureE+0x241)[0x7f8223378cf1]
[vision1:3276138] [ 9] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4Impl41safe_throw_allocation_with_header_failureERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_RKNS_12Experimental26RawMemoryAllocationFailureE+0x241)[0x7f6c205cacf1]
[vision1:3276139] [ 9] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4Impl30checked_allocation_with_headerINS_9CudaSpaceEEEPNS0_22SharedAllocationHeaderERKT_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEm+0x5e)[0x7f8223384d4e]
[vision1:3276138] [10] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4Impl30checked_allocation_with_headerINS_9CudaSpaceEEEPNS0_22SharedAllocationHeaderERKT_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEm+0x5e)[0x7f6c205d6d4e]
[vision1:3276139] [10] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4Impl22SharedAllocationRecordINS_9CudaSpaceEvEC1ERKS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmPFvPNS1_IvvEEE+0x2c)[0x7f8223381f4c]
[vision1:3276138] [11] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4Impl22SharedAllocationRecordINS_9CudaSpaceEvEC1ERKS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmPFvPNS1_IvvEEE+0x2c)[0x7f6c205d3f4c]
[vision1:3276139] [11] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4ViewIPPiJNS_11LayoutRightENS_4CudaEvEEC1IJNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEERKNS_4Impl12ViewCtorPropIJDpT_EEERKNSt9enable_ifIXntsrSH_11has_pointerES3_E4typeE+0x1cc)[0x7f821f827f7c]
[vision1:3276138] [12] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos4ViewIPPiJNS_11LayoutRightENS_4CudaEvEEC1IJNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEERKNS_4Impl12ViewCtorPropIJDpT_EEERKNSt9enable_ifIXntsrSH_11has_pointerES3_E4typeE+0x1cc)[0x7f6c1ca79f7c]
[vision1:3276139] [12] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos11impl_resizeIJEPPiJNS_11LayoutRightENS_4CudaEvEEENSt9enable_ifIXoosrSt7is_sameINS_4ViewIT0_JDpT1_EE12array_layoutENS_10LayoutLeftEE5valuesrS6_ISC_S3_E5valueEvE4typeERSB_mmmmmmmmDpRKT_+0x161)[0x7f6c1ca7dd21]
[vision1:3276139] [13] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos11impl_resizeIJEPPiJNS_11LayoutRightENS_4CudaEvEEENSt9enable_ifIXoosrSt7is_sameINS_4ViewIT0_JDpT1_EE12array_layoutENS_10LayoutLeftEE5valuesrS6_ISC_S3_E5valueEvE4typeERSB_mmmmmmmmDpRKT_+0x161)[0x7f821f82bd21]
[vision1:3276138] [13] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos8DualViewIPPiNS_11LayoutRightENS_4CudaEvE11impl_resizeIJEEEvmmmmmmmmDpRKT_+0x5d9)[0x7f6c1ca81929]
[vision1:3276139] [14] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN6Kokkos8DualViewIPPiNS_11LayoutRightENS_4CudaEvE11impl_resizeIJEEEvmmmmmmmmDpRKT_+0x5d9)[0x7f821f82f929]
[vision1:3276138] [14] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS14GridCommKokkosIN6Kokkos4CudaEE7indicesERNS1_8DualViewIPPiNS1_11LayoutRightES2_vEEiiiiiii+0x472)[0x7f6c1d9b1f12]
[vision1:3276139] [15] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS14GridCommKokkosIN6Kokkos4CudaEE7indicesERNS1_8DualViewIPPiNS1_11LayoutRightES2_vEEiiiiiii+0x472)[0x7f822075ff12]
[vision1:3276138] [15] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS14GridCommKokkosIN6Kokkos4CudaEE13setup_regularERiS4_+0x306)[0x7f6c1d9b2326]
[vision1:3276139] [16] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS14GridCommKokkosIN6Kokkos4CudaEE13setup_regularERiS4_+0x306)[0x7f8220760326]
[vision1:3276138] [16] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS10PPPMKokkosIN6Kokkos4CudaEE8allocateEv+0x15a1)[0x7f6c1e191641]
[vision1:3276139] [17] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS10PPPMKokkosIN6Kokkos4CudaEE8allocateEv+0x15a1)[0x7f8220f3f641]
[vision1:3276138] [17] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS10PPPMKokkosIN6Kokkos4CudaEE4initEv+0x5f8)[0x7f6c1e188358]
[vision1:3276139] [18] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS10PPPMKokkosIN6Kokkos4CudaEE4initEv+0x5f8)[0x7f8220f36358]
[vision1:3276138] [18] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS5Force4initEv+0x66)[0x7f821f6332d6]
[vision1:3276138] [19] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS5Force4initEv+0x66)[0x7f6c1c8852d6]
[vision1:3276139] [19] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS6LAMMPS4initEv+0x16)[0x7f6c1c8e8096]
[vision1:3276139] [20] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS6LAMMPS4initEv+0x16)[0x7f821f696096]
[vision1:3276138] [20] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS8Minimize7commandEiPPc+0x1a6)[0x7f6c1c952026]
[vision1:3276139] [21] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS8Minimize7commandEiPPc+0x1a6)[0x7f821f700026]
[vision1:3276138] [21] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS5Input15execute_commandEv+0xb42)[0x7f6c1c8dddd2]
[vision1:3276139] [22] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS5Input15execute_commandEv+0xb42)[0x7f821f68bdd2]
[vision1:3276138] [22] /opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS5Input4fileEv+0x166)[0x7f6c1c8de2d6]
[vision1:3276139] [23] lmp[0x4013fa]
[vision1:3276139] [24] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f6c1ada80b3]
[vision1:3276139] [25] lmp[0x40160e]
[vision1:3276139] *** End of error message ***
/opt/software/software/LAMMPS/23Jun2022-foss-2021a-kokkos-CUDA-11.3.1/lib/liblammps.so.0(_ZN9LAMMPS_NS5Input4fileEv+0x166)[0x7f821f68c2d6]
[vision1:3276138] [23] lmp[0x4013fa]
[vision1:3276138] [24] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f821db560b3]
[vision1:3276138] [25] lmp[0x40160e]
[vision1:3276138] *** End of error message ***

Thanks in advance for the help

With less than a thousand atoms there is no point in using GPU acceleration. You simply don’t have enough work units to make good use of a single GPU, let alone two.
Have you tried running without KOKKOS enabled?

On the CPU you should use MPI parallelization.

If it still fails, please provide the complete input deck.

Hello Dr. Axel, thank you so much for your fast answer.

I am aware that GPU is not worth it for these small systems. However, this is just a debug system. The experimental system has 72k atoms.

I ran the same input using CPU with MPI parallelization and got the “Must redefine kspace_style after changing to triclinic box” error.
I redefined the kspace_style after the read_data command and rerun the simulation.
All went fine on CPU, so I tried the corrected code in GPU as well.

I’m happy to say that, after these changes, the simulation ran smoothly.

TLDR: The GPU error vanished after I defined the kspace_style after the read_data. However, the error was not very clear and suggestive of the solution.

Once again, thanks a lot for the attention.

That is because the KOKKOS version of the PPPM kspace style was missing the check that triggered the error on the CPU. This looks like you have run into a 6 year old memory corruption bug. Since for accelerated packages, there are a number of replicated code paths, they all need to be updated consistently for all bugfixes and enhancements. Most of those are usually detected and applied first for the plain version of a style and not everybody is always aware of all the possible variants. Since 6 years ago, we have systematically improved testing to make such cases less likely, but it looks like this one slipped through the cracks.

This will be remedied in the next feature release version of LAMMPS which is planned for tomorrow.

Since you are using KOKKOS, it is advisable to update to either the latest update (#4) for the stable version or the latest feature release (28 March 2023 or possibly 15 June 2023 if all works out as planned), since they contain many KOKKOS related bugfixes and updates.

Glad you figured out the problem–let us know if you have more issues.