Segmentation fault-- Modified Airebo

_Md_Imrul_Reza_Shish · January 2, 2019, 9:04pm

Dear all

Good day.

Recently I face problem in LAMMPS simulations in the cluster. When I run the jobs in cluster few simulation jobs from my all simulations stop automatically in between with the following error.

mpirun noticed that process rank 3 with PID 0 on node cph-cXX exited on signal 11 (Segmentation fault).

and warning

mbind: Invalid argument

From web search, I understand that this type of error may occur when an older version of software use with parallel operation. But in my simulations, I am using the newer version (lammps/31Mar17) from the cluster which is latest in the university cluster.

The only change in the different simulations is in the data file with different atom orientation and number with same panel size.

A part of the input file:

------------------------ INITIALIZATION ----------------------------

units metal
dimension 3
atom_style atomic
boundary p p p

-------------------- ATOM DEFINITION 10/17/13----------------------------

read_data coord_PCG.data
mass 1 12.0107

---------------------- FORCE FIELDS 9/23/13------------------------------

pair_style airebo 1.92 1 1
pair_coeff * * CH.airebo C

#################################### GROUP DEFINE #####################################

###################################### Computes ##########################################

some compute command

############################### Defining stress and strain variables ################################

some variable command

###################################### Thermo commands ####################################

thermo 1000
thermo_style custom step temp lx ly lz pxx pyy pzz time pe etotal v_Up1 v_Up2 v_Up3 v_Down1 v_Down2 v_Down3 v_p2 v_p3 v_p4 v_p5 v_p6 v_p7 v_p8 v_L1 v_L2 v_L3 v_R1 v_R2 v_R3
thermo_modify line one format float %20.10f
timestep 0.001

dump myDump all custom 10000 dump.lammpstrj id type x y z c_STRESS[1] c_STRESS[2] c_STRESS[3] c_STRESS[4] c_STRESS[5] c_STRESS[6]
dump_modify myDump append yes first yes

####################################### Minimization #########################################

min_style cg
fix 1 all box/relax iso 0.0 vmax 0.001
minimize 1e-25 1e-15 50000 10000
unfix 1

reset_timestep 0

####################################### DEFORMATION ##################################

fix lower Down setforce NULL 0.0 0.0 #freezing lower boundary atoms
fix upper Up setforce NULL 0.0 0.0 #Setting zero forces only in x and z directions
fix left2 left setforce 0.0 NULL 0.0 #freezing lower boundary atoms
fix right2 right setforce 0.0 NULL 0.0 #Setting zero forces only in x and z directions

############ Equilibrating the system ##############

velocity group2 create 500 9284155
fix 1 group2 nve
fix 2 group2 temp/berendsen 500 500 0.1
run 40000
unfix 1
unfix 2

velocity group2 create 300 9284155
fix 1 group2 nve
fix 2 group2 temp/berendsen 300 300 0.1
run 30000
unfix 1
unfix 2

fix 3 group2 npt temp 300 300 0.1 iso 1.0 1.0 1.0 #Tdamp is 100 times the timestep
run 100000
unfix 3

akohlmey · January 2, 2019, 9:55pm

Dear all
Good day.

Recently I face problem in LAMMPS simulations in the cluster. When I run the jobs in cluster few simulation jobs from my all simulations stop automatically in between with the following error.

mpirun noticed that process rank 3 with PID 0 on node cph-cXX exited on signal 11 (Segmentation fault).

and warning

mbind: Invalid argument

From web search, I understand that this type of error may occur when an older version of software use with parallel operation. But in my simulations, I am using the newer version (lammps/31Mar17) from the cluster which is latest in the university cluster.

31 Mar 2017 is not exactly "new". it is over 20 months old. we just
released a 12 Dec 2018 version.

overall, you are jumping to conclusions without any evidence. the
information you provide here, allows no such conclusion. there are
many more reasons, why you can have a segmentation fault.

The only change in the different simulations is in the data file with different atom orientation and number with same panel size.

are those crashes reproducible with the same input? also with
different numbers of CPUs? can you run in serial? are there any
warnings or any indications of issues? does the calculation stop/crash
immediately or only after a while? have you checked with the folks
operating your cluster, that all nodes are functioning correctly?

axel.

_Md_Imrul_Reza_Shish · January 3, 2019, 3:23pm

Hello Axel
Thanks for your response.

The crash job are uncertain. If I run the same input it may run sometime. I run simulation in the cluster with 2 node in each node has 16 processor. Only warning I received mbind: Invalid argument which is also uncertain. Not in every crash. And the job crash after 20~30k equilibrium step sometimes more than that. I also contacted the cluster management official for their feedback.