Error while running on cluster

Good day!

I am not sure if this is entirely lammps related, but if someone has had a similar issue before I am hoping that you can help me. I am trying to run the following script (graphene equilibration) on cluster but I notice that it gets stuck on timestep 852 after which eventually my job is deleted because of exceeding walltime. When I run the same script on my system (with fewer processors), it works fine. Funnily enough, the same script with two layers of graphene runs fine on the cluster. Can anyone help me out/ point me in the right direction?

I am using openmpi/1.8.4-gcc-4.9.2. I have also attached my log file.

#Input file

clear
echo both

#--------------------------INITIALISATION-------------------------------
units real
dimension 3
boundary p p p
atom_style charge
read_data graphene.dat

#neigh_modify every 10 delay 0 check no

#--------------------------SET POTENTIAL--------------------------------

pair_style reax/c NULL checkqeq no
pair_coeff * * ffield.reax C N

#--------------------------MINIMISE SYSETEM-------------------------
dump min all atom 150 result_min.lammpstrj
minimize 1.0e-7 1.0e-9 100000 100000
undump min
reset_timestep 0

#-----------------------EQUILIBRATE SYSTEM---------------------------
timestep 0.25

dump equi all atom 1 result_equi.lammpstrj
thermo 1
thermo_style custom step temp etotal spcpu

velocity all create 300.0 4928459 rot yes dist gaussian
fix ensemble all npt temp 300.0 677.0 10.0 iso 1.0 1.0 100.0
run 100000

unfix ensemble
fix ensemble all npt temp 677.0 677.0 10.0 iso 1.0 1.0 100.0
run 60000

undump equi
write_restart restart.mpiio

log.equilibration (2.47 KB)

John,

Thankyou for your reply,

What version of LAMMPS were you using

lammps 16 feb 2016

, how many atoms did you have,
512 atoms, (4 layers of 128 atoms of carbon)

how many MPI processes did you use on the compute cluster and on your laptop/workstation,
I used 4 on my laptop and tried with 8 ppn and 4ppn on cluster.

how did the box dimensions change under NPT,
I would say by about 1%

have you run a NVE run,
no I haven’t, I wanted isotropic graphite

and finally why did not you use fix qeq/relax for charge equilibration?

Is this a necessity? My simulation requires only van der walls interactions to be checked, so I thought this isn’t required?

I would run a NVE with charge equilibration turned on (fix qeq/reax) on your cluster and see how it goes. Visualize your simulation, and then switch to NPT. Try to write dump files more frequently so you can visualized what went wrong around 850 steps. Other than that, nothing much can be told.

Ok. Thank you. I tried using a debugger, gdb, to check the problem. It seems that there is a segmentation fault at that point, after 852 timesteps. This is the error message I received.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000828fe2 in BOp (workspace=0xb67d00, bonds=0xb67fc0, bo_cut=0.001, i=0, btop_i=0, nbr_pj=0x7fffed2237c0, sbp_i=0xc6ec60,
sbp_j=0x2e8e739060, twbp=0x2ba598d460) at …/reaxc_bond_orders.cpp:274
274 if( sbp_i->r_s > 0.0 && sbp_j->r_s > 0.0 ) {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.166.el6_7.7.x86_64
(gdb) where
#0 0x0000000000828fe2 in BOp (workspace=0xb67d00, bonds=0xb67fc0, bo_cut=0.001, i=0, btop_i=0, nbr_pj=0x7fffed2237c0, sbp_i=0xc6ec60,
sbp_j=0x2e8e739060, twbp=0x2ba598d460) at …/reaxc_bond_orders.cpp:274
#1 0x000000000082eaf0 in Init_Forces_noQEq (system=0xb66750, control=Unhandled dwarf expression opcode 0xf3
) at …/reaxc_forces.cpp:291
#2 0x000000000082f3ec in Compute_Forces (system=0xb66750, control=0xb671d0, data=0xb677b0, workspace=0xb67d00, lists=0xb57390,
out_control=0xb68120, mpi_data=0xb68230) at …/reaxc_forces.cpp:446
#3 0x0000000000797355 in LAMMPS_NS::PairReaxC::compute (this=0xb57000, eflag=Unhandled dwarf expression opcode 0xf3
) at …/pair_reax_c.cpp:509
#4 0x000000000087a1c5 in LAMMPS_NS::Verlet::run (this=0xb4d770, n=5000) at …/verlet.cpp:293
#5 0x0000000000850dd8 in LAMMPS_NS::Run::command (this=0x7fffffffd800, narg=Unhandled dwarf expression opcode 0xf3
) at …/run.cpp:175
#6 0x000000000060319e in LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run> (lmp=Unhandled dwarf expression opcode 0xf3
) at …/input.cpp:723
#7 0x0000000000601a94 in LAMMPS_NS::Input::execute_command (this=0xb3f970) at …/input.cpp:706
#8 0x0000000000602dcc in LAMMPS_NS::Input::file (this=0xb3f970) at …/input.cpp:243
#9 0x00000000006121e3 in main (argc=3, argv=0x7fffffffdaa8) at …/main.cpp:31

Any suggestions?

Thank you again for your time.

Hey,

better use the GDB version which corresponds
to the GCC you compiled your code with.

To the actual segfault I cannot say much that way.

Best,
S.

Hey,

better use the GDB version which corresponds
to the GCC you compiled your code with.

​this statement makes no sense.​ gdb is independent from gcc.

To the actual segfault I cannot say much that way.

​the stack trace is fine and since the code has been compiled with -g​ we
can even see the likely offending suspects and files and line numbers.
that is as good as it gets.

axel.

    Hey,

    better use the GDB version which corresponds
    to the GCC you compiled your code with.

​this statement makes no sense.​ gdb is independent from gcc.

Well,
but the unhandled dwarf opcode message suggests that the GDB
does not understand what the GCC compiled code emits
(-gdwarf-3 option).
More recent version of GDB should understand however.
(Maybe this is not of particular interest here, sorry!)

Best,
S.