How to deal with the error about memory

Hi! I got this error in my system—ERROR on proc 0: Failed to allocate -108972 bytes for array reaxff/species:molmap (…/memory.cpp:66)

My system contains about 13,000 atoms, which are 50 C18 organic molecules, 50 hydrogen molecules, and an iron-based catalyst of about 5 nanometers in the middle. The cube box is 20 nanometers long. I was building the pdb files of all the molecules first, then using packmol to stack them into boxes with the catalyst fixed in the center. The force field uses the reaction force field.

Someone suggested to me whether the initial configuration is poor, but I don’t know how to correct it, and how to solve this memory problem.

Thank you all!

There are multiple issues at hand. Whether you have a bad initial geometry can be found out through visualization. Sometimes the problem is not the geometry by itself, but incorrect box dimensions.

The error at hand is apparently coming from a non-essential command: fix reaxff/species.
So the first step is to eliminate that command from your input. If that is not sufficient, remove other non-essential commands.

Then the second step is to monitor your memory usage - assuming that your simulation can continue without the non-essential commands - and check how much each commands adds to the baseline and also see how they are connected to command line settings. Often there are some settings, e.g. those controlling averaging, that significantly impact memory use. But most important would be to identify which of those individual commands is causing the most significant problem and then carefully study its documentation and try to apply common sense and scientific thinking.

If that provides no help, you need to produce a minimal test case that can easily reproduce the problem and where you have done a best effort (and explain it) to narrow down what settings cause the problem and why you need them to be like that and cannot change them to something that may make the simulation continue.

You could also try using the Kokkos version of ReaxFF, which is more memory robust. Kokkos also has tools to easy profile memory use. However, most likely this is a bad initial configuration as mentioned. You could try relaxing the initial system through energy minimization or velocity rescaling, probably not as a final solution since it could mask a problem, but to see if that makes the issue go away.

are you taking about -DKokkos_ENABLE_DEBUG=on or -DKokkos_ENABLE_CUDA_UVM=on ?

ill try to rebuild lammps tomorrow with both because im getting memory allocation errors:

Exception: Kokkos failed to allocate memory for label “atom:dihedral_atom1”. Allocation using MemorySpace named “Cuda” failed with the following error: Allocation of size 115.1 M failed, likely due to insufficient memory. (The allocation mechanism was cudaMalloc(). The Cuda allocation returned the error code “cudaErrorMemoryAllocation”.)

Exception: Kokkos failed to allocate memory for label “atom:dihedral_atom3”. Allocation using MemorySpace named “Cuda” failed with the following error: Allocation of size 116.3 M failed, likely due to insufficient memory. (The allocation mechanism was cudaMalloc(). The Cuda allocation returned the error code “cudaErrorMemoryAllocation”.)

on my PDB 2CV5 GPU simulation:

614751 atoms
72 atom types
419078 bonds
123 bond types
243891 angles
267 angle types
81972 dihedrals
550 dihedral types
3524 impropers
25 improper types
974 crossterms

-93 93 xlo xhi
-93 93 ylo yhi
-93 93 zlo zhi

mpirun -np 32 ~/.local/bin/lmp -in step5_production.inp -k on g 4 -sf kk

echo            screen
variable        dcdfreq index 5000
variable        outputname index step5_production
variable        inputname  index step4.1_equilibration

units           real
boundary        p p p
newton          off

# --- IF KOKKOS ---
    atom_style      full/kk
    bond_style      harmonic/kk
    angle_style     charmm/kk

    improper_style  harmonic # try without kk to solve memory overflow


# --- TO BE IMPLEMENTED IN KOKKOS (BIG TODO)
#    pair_style      lj/charmmfsw/coul/long/kk 10 12
#    dihedral_style  charmmfsw/kk
#    special_bonds   charmm/kk
    pair_style      lj/charmmfsw/coul/long 10 12
    dihedral_style  charmmfsw
    special_bonds   charmm


# --- IF NOT KOKKOS ---
#    kspace_style    pppm 1e-4
#    atom_style      full
#    bond_style      harmonic
#    angle_style     charmm
#    improper_style  harmonic


pair_modify     mix arithmetic
#kspace_modify fftbench yes

fix             cmap all cmap charmmff.cmap
fix_modify      cmap energy yes
read_data       step3_input.data fix cmap crossterm CMAP
run_style 	verlet/kk

variable        laststep file ${inputname}.dump
next            laststep
read_dump       ${inputname}.dump ${laststep}  x y z vx vy vz ix iy iz box yes replace yes format native

kspace_style    pppm/kk 1e-4
neighbor        2 bin
neigh_modify    delay 5 every 1

include         restraints/constraint_angletype
fix             1 all shake/kk 1e-6 500 0 m 1.008 a ${constraint_angletype}
fix             2 all npt/kk temp 303.15 303.15 100.0 iso   0.9869233 0.9869233 1000 couple  xyz mtk no pchain 0

thermo          10
thermo_style    one
dump 		lammpstrj all custom 10 3ft6.lammpstrj id mol type x y z ix iy iz
reset_timestep  0
timestep        2

variable ps loop 1000
	 label loop
	 print "================== TIME = $(v_ps/1000:%.3f) ns =================="
	 run 50
	 #write_restart 2cv5-solution-timestep_*.lammpsrestart
	 next ps
jump SELF loop

undump lammpstrj

worked fine CPU only but obviously pretty slow:

Kokkos GPU runs fine on a fairly minimal example PDB 3FT6

             30374  atoms
             20380  bonds
             10782  angles
              1448  dihedrals
                38  impropers

                45  atom types
                66  bond types
               128  angle types
               240  dihedral types
                11  improper types

    -34.5000      34.5000 xlo xhi
    -34.5000      34.5000 ylo yhi
    -34.5000      34.5000 zlo zhi

but little hope at this point of getting to PDB 6HKT. charmmgui job has been running for 3 days. not even done solvation step yet (a lot more complicated than you would expect)

    168697 !NATOM
    174941 !NBOND: bonds
    317565 !NTHETA: angles
    462018 !NPHI: dihedrals
     23247 !NIMPHI: impropers
     17668 !NDON: donors
     25274 !NACC: acceptors
      5988 !NCRTERM: cross-terms

even the 3D printer at the faculty of engineering choked on that one !

What GPUs do you have?

are you taking about -DKokkos_ENABLE_DEBUG=on or -DKokkos_ENABLE_CUDA_UVM=on ?

No I mean kokkos-tools/profiling/space-time-stack at develop · kokkos/kokkos-tools · GitHub.

mpirun -np 32 ~/.local/bin/lmp -in step5_production.inp -k on g 4 -sf kk

Typically 1 MPI rank per GPU is best, unless you have styles that are not yet ported to Kokkos. If you use multiple MPI ranks per GPU, then you MUST enable CUDA MPS (1. Introduction — Multi-Process Service r550 documentation) to get good performance.

kspace_style pppm/kk 1e-4

If you use the suffix on the command line: -sf kk, then no need to add it everywhere in the input file.

available node types:

Béluga: 40 cores, 4 V100-16gb
Cedar p100: 24 cores, 4 P100-12gb
Cedar p100l: 24 cores, 4 P100-16gb
Cedar v100l: 32 cores, 4 V100-32gb
Graham p100: 32 cores, 2 P100-12gb
Graham v100: 28 cores, 8 V100-16gb
Graham t4: 44 cores, 4 T4-16gb
Mist: 32 cores, 4 V100-32gb
Narval: 48 cores, 4 A100-40gb

i added “Troubleshooting memory allocation on GPUs” subsection to Speed_kokkos.rst as part of PR 4028.

This is actually quite rare in the academic world where most clusters have a lot more CPUs than GPUs (see above). Demand for GPUs is much higher and supply limited given almost everybody is doing “machine learning” these days. In my case, GPU-hours are accounted at 4x-10x multiples of CPU hours so i need to optimize the relative spread of computations across CPUs and GPUs. i can also get a lot of low-priority CPU hours at no quota cost.

oops i forgot to put that in Speed_kokkos.rst PR 4028. is there a way to add a second commit to a pull request in progress ?

sorry i prefer to be explicit so i know exactly what’s going on. i can track which commands are being accelerated where, which ones are missing and need to be contributed.

added @stamoor comment to second commit on PR 4028