mprun error

Dear LAMMPS and HPC experts,

I am trying to run lammps on HPC. We installed LAMMPS (16 Mar 2018). So, lammps correctly works for small models. However, it seems that when the model becomes large(in order of 100 nm), computation cannot start and we face with this error:

mpirun noticed that process rank 56 with PID 0 on node node03 exited on signal 9 (Killed).

please let me know whether you have any idea to solve this issue or not.

the output is listed below:

LAMMPS (16 Mar 2018)

Created orthogonal box = (0 0 0) to (1893 1893 2288.06)
4 by 6 by 6 MPI processor grid
Lattice spacing in x,y,z = 3.786 3.786 9.514
Created 46073888 atoms
Time spent = 5.00858 secs
46073888 atoms in group particle
Setting atom values …
15357926 settings made for charge
Setting atom values …
30715962 settings made for charge
Created 6000000 atoms
Time spent = 0.403862 secs
6000000 atoms in group substrate
Setting atom values …
17357926 settings made for charge
Setting atom values …
34715962 settings made for charge
Created 3000000 atoms
Time spent = 0.485369 secs
3000000 atoms in group lower_substrate
Setting atom values …
18357926 settings made for charge
Setting atom values …
36715962 settings made for charge
52073888 atoms in group model
WARNING: New thermo_style command, previous thermo_modify settings will be lost (/builddir/build/BUILD/lammps-stable_16Mar2018/src/output.cpp:705)
PPPM initialization …
WARNING: System is not charge neutral, net charge = -120.78 (/builddir/build/BUILD/lammps-stable_16Mar2018/src/kspace.cpp:302)
using 12-bit tables for long-range coulomb (/builddir/build/BUILD/lammps-stable_16Mar2018/src/kspace.cpp:321)
G vector (1/distance) = 0.176612
grid = 675 675 768
stencil order = 5
estimated absolute RMS force accuracy = 0.0429126
estimated relative force accuracy = 0.00012923
using double precision FFTs
3d grid and FFT values/proc = 2746450 2733750
Neighbor list info …
update every 1 steps, delay 5 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 17
ghost atom cutoff = 17
binsize = 8.5, bins = 223 223 270
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair buck/coul/long, perpetual
attributes: half, newton on
pair build: half/bin/atomonly/newton
stencil: half/bin/3d/newton
bin: standard
Setting up Verlet run …
Unit style : real
Current step : 0
Time step : 1

​please note, that this is not a well formulated question: when i would
answer it accurately, i would have to tell you "yes" and nothing else.

​going beyond that, this doesn't look like a LAMMPS specific problem at
all, but rather something related to the machine you are running on. the
fact, that this is triggered by a large system size hints at you running
out of available RAM on the nodes, you are running on. this is something
that you have to work out with your local system managers. so my suggestion
is that you work with them to determine what is causing this (i.e. whether
your jobs trigger the OOM killer feature).

axe.

​[...]​

Dear Axel, 

Thank you very much for your email. Actually, I have checked the LAMMPS performance on our HPC with the machine supervisors. it seems that lammps consumes all memory on a node until that node is out of memory and then freezes, regardless of what I reduce the numbers of nodes to use.

Beside that memory consumption on the allocated nodes is not symmetrical but on two nodes much higher than on the others.

Although I applied an MPI command in my script.bash file in order to manage the the run between all allocated nodes, Should I insert similar command again inside my lammps input file??

My input file is as belwo. please let me know whether I misses something.....

input file:
#Phase 1 ------------------------------------------Simulation main setup----------------------------------
dimension 3
units real
atom_style charge
boundary p p p

variable radius equal 500 # makes 1000 Ang or 100nm particle
variable n equal 500 # makes 81*3.786 length=306

variable a1 equal 3.786
variable a2 equal 3.786
variable a3 equal 9.514

variable boxx equal “v_nv_a1"
variable boxy equal "v_n
v_a2”

variable diameter equal “2v_radius"
variable x0 equal "0.5
v_boxx”
variable y0 equal “0.5v_boxy"
variable subs_thick equal "3
v_a3” #v_a3 equal 9.514
variable subslow_thick equal “1v_a3"
variable z_gap equal "0.5
v_radius”
variable distance equal “v_subs_thick+v_subslow_thick+v_z_gap+v_radius”
variable boxz equal “v_distance+v_diameter+v_radius”

#simulation box
region box block 0 {boxx} 0 {boxy} 0 ${boxz} units box
create_box 2 box #2 is number of atoms

lattice custom 1 a1 3.786 0.00000 0.00000 a2 0.0000 3.786 0.00000 a3 0 0 9.514 &
basis 0.0000 0.2500 0.3750 &
basis 0.0000 0.7500 0.6250 &
basis 0.5000 0.7500 0.8750 &
basis 0.5000 0.2500 0.1250 &
basis 0.0000 0.0000 0.1700 &
basis 0.0000 0.7500 0.4200 &
basis 0.5000 0.2500 0.3300 &
basis 0.5000 0.7500 0.0800 &
basis 0.500 0.5000 0.6700 &
basis 0.500 0.2500 0.9200 &
basis 0.000 0.7500 0.8300 &
basis 0.000 0.2500 0.5800
mass 1 47.86000
mass 2 15.99940

#particle
region particle sphere {x0} {y0} {distance} {radius} units box
create_atoms 2 region particle &
basis 1 1 &
basis 2 1 &
basis 3 1 &
basis 4 1 &
basis 5 2 &
basis 6 2 &
basis 7 2 &
basis 8 2 &
basis 9 2 &
basis 10 2 &
basis 11 2 &
basis 12 2
group particle region particle
set type 1 charge 2.196
set type 2 charge -1.098

#substrate
region substrate block 0 {boxx} 0 {boxy} {subslow_thick} {subs_thick} units box
create_atoms 2 region substrate &
basis 1 1 &
basis 2 1 &
basis 3 1 &
basis 4 1 &
basis 5 2 &
basis 6 2 &
basis 7 2 &
basis 8 2 &
basis 9 2 &
basis 10 2 &
basis 11 2 &
basis 12 2
group substrate region substrate
set type 1 charge 2.196
set type 2 charge -1.098
#group model union particle substrate

#lower_substrate
region lower_substrate block 0 {boxx} 0 {boxy} 0 ${subslow_thick} units box
create_atoms 2 region lower_substrate &
basis 1 1 &
basis 2 1 &
basis 3 1 &
basis 4 1 &
basis 5 2 &
basis 6 2 &
basis 7 2 &
basis 8 2 &
basis 9 2 &
basis 10 2 &
basis 11 2 &
basis 12 2
group lower_substrate region lower_substrate
set type 1 charge 2.196
set type 2 charge -1.098
group model union particle substrate

#–Phase 2----------------------------------------Buckingham Potential-----------------------------------------------

pair_style buck/coul/long 15
pair_coeff 1 1 717647.40 0.154 121.067
pair_coeff 1 2 391049.10 0.194 290.331
pair_coeff 2 2 271716.30 0.234 696.888

neighbor 2.0 bin # skin distance for real units is by default 2.0
neigh_modify every 1 delay 0 check yes

kspace_style pppm 0.0001

#pair_style lj/cut/coul/cut 6.0 15.0 # cut off is usually 2.5 unitless in lj
#pair_coeff 1 1 0.609 1.9565
#pair_coeff 1 2 0.292 2.4419
#pair_coeff 2 2 0.140 2.9273

neigh_modify delay 5

#----Phase 4-------------------------------------Initial Equilibration at 300K ----------------------------------------
reset_timestep 0
timestep 1.0 # or 2
velocity all create 300 12345 mom yes rot no

fix 1 lower_substrate setforce 0.0 0.0 0.0
fix 2 particle nvt temp 300.0 300.0 100.0
fix 3 substrate nvt temp 300.0 300.0 100.0

thermo 100
dump 1 all xyz 100 dump1000.txt

run 20000
unfix 2

#----Phase 5---------------------------------------Particle Impact at the room temperature -------------------
fix 4 particle nve
velocity particle set 0 0 -0.003 units box

thermo 100
run 50000

Yours Sincerely,

Dear Axel,

Thank you very much for your email. Actually, I have checked the LAMMPS performance on our HPC with the machine supervisors. it seems that lammps consumes all memory on a node until that node is out of memory and then freezes, regardless of what I reduce the numbers of nodes to use.

​reducing the number of nodes doesn't make any sense. ​on the contrary,
what you would need to do is to *increase* the number of nodes used and
possibly also *decrease* the number of processes per node (and then use
multi-threading). LAMMPS uses domain decomposition and the per-atom storage
is thus distributed across MPI ranks. the amount of memory used per MPI
rank corresponds to the number of atoms managed by this rank.

Beside that memory consumption on the allocated nodes is not symmetrical but on two nodes much higher than on the others.

​yes, most likely because those two nodes have more atoms than others.

Although I applied an MPI command in my script.bash file in order to manage the the run between all allocated nodes, Should I insert similar command again inside my lammps input file??

​you are, again, not making much sense here and you don't seem to be
understanding both MPI and LAMMPS parallelization. i don't know how to
explain on that basis how to address your problem.​

My input file is as belwo. please let me know whether I misses something.....

​you may want to consider using the processors command and possibly also
the balance command to redistribute the workload. but the real issue is
that you have to get a clue about running in parallel with MPI, and
understand LAMMPS' parallelization scheme based on MPI and domain
decomposition.

axel.