Regarding running a LAMMPS Program on HPC

Dear all,
I am trying to run a simulation on HPC.I have defined a region,beyond “10 Angstorm” it is showing following error DESCRIBED below .My region defined is
region whole block 0 20.0 -80.0 80.0 -80.0 80.0 as it is written 20 in x-axis,

But running beyond 10 it always shows the error described below.
As running it on multiple processors(nodes=5;ppn=20) shows the error.
But when we run the same program on a single processor then there is no error.The program only runs on single processor.Can anybody tell me how to debug this problem.

With Regards,
Ritesh Satwani

ERROR ON HPC

[warn] Epoll ADD(4) on fd 1 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Operation not permitted

[warn] Epoll ADD(4) on fd 2 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Operation not permitted
[c5d:1279 :0] Caught signal 11 (Segmentation fault)
[c5d:1281 :0] Caught signal 11 (Segmentation fault)
[c5d:1278 :0] Caught signal 11 (Segmentation fault)
[c5d:1276 :0] Caught signal 11 (Segmentation fault)
[c5d:1277 :0] Caught signal 11 (Segmentation fault)
[c5d:1272 :0] Caught signal 11 (Segmentation fault)
[c4d:30569:0] Caught signal 11 (Segmentation fault)
[c4d:30585:0] Caught signal 11 (Segmentation fault)
[c4d:30572:0] Caught signal 11 (Segmentation fault)
[c4d:30566:0] Caught signal 11 (Segmentation fault)
[c4d:30587:0] Caught signal 11 (Segmentation fault)
[c4d:30576:0] Caught signal 11 (Segmentation fault)
[c4d:30573:0] Caught signal 11 (Segmentation fault)
[c4d:30565:0] Caught signal 11 (Segmentation fault)
[c5a:30192:0] Caught signal 11 (Segmentation fault)
[c5a:30188:0] Caught signal 11 (Segmentation fault)
[c5a:30196:0] Caught signal 11 (Segmentation fault)
[c5a:30186:0] Caught signal 11 (Segmentation fault)
[c5a:30172:0] Caught signal 11 (Segmentation fault)
[c5a:30173:0] Caught signal 11 (Segmentation fault)
[c5a:30181:0] Caught signal 11 (Segmentation fault)
[c4d:30570:0] Caught signal 11 (Segmentation fault)
[c4d:30575:0] Caught signal 11 (Segmentation fault)
[c4d:30582:0] Caught signal 11 (Segmentation fault)
[c5a:30171:0] Caught signal 11 (Segmentation fault)
[c4d:30574:0] Caught signal 11 (Segmentation fault)
[c5b:31329:0] Caught signal 11 (Segmentation fault)
[c5b:31333:0] Caught signal 11 (Segmentation fault)
[c5b:31328:0] Caught signal 11 (Segmentation fault)
[c5b:31332:0] Caught signal 11 (Segmentation fault)
[c5b:31337:0] Caught signal 11 (Segmentation fault)
[c5b:31340:0] Caught signal 11 (Segmentation fault)
[c5b:31339:0] Caught signal 11 (Segmentation fault)
[c5b:31354:0] Caught signal 11 (Segmentation fault)
[c5b:31342:0] Caught signal 11 (Segmentation fault)
[c5b:31338:0] Caught signal 11 (Segmentation fault)
==== backtrace ====
2 0x000000000006397c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.2.2989/src/mxm/util/debug/debug.c:641
==== backtrace ====
2 0x000000000006397c mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.2.2989/src/mxm/util/debug/debug.c:641
3 0x0000000000063aec mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.2.2989/src/mxm/util/debug/debug.c:616
4 0x0000003c3e2329a0 killpg() ??:0
5 0x000000000084b41c _ZN9LAMMPS_NS8Neighbor8full_binEPNS_9NeighListE() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/neigh_full.cpp:305
6 0x00000000008429d2 _ZN9LAMMPS_NS8Neighbor9build_oneEPNS_9NeighListEi() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/neighbor.cpp:1555
7 0x00000000004f3322 _ZN9LAMMPS_NS16ComputeCoordAtom15compute_peratomEv() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/compute_coord_atom.cpp:144
8 0x00000000005b4bc4 _ZN9LAMMPS_NS10DumpCustom5countEv() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/dump_custom.cpp:417
9 0x00000000005ace7a _ZN9LAMMPS_NS4Dump5writeEv() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/dump.cpp:292
10 0x0000000000870d88 _ZN9LAMMPS_NS6Output5writeEl() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/output.cpp:303
11 0x0000000000feea30 _ZN9LAMMPS_NS6Verlet3runEi() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/verlet.cpp:305
12 0x0000000000fb6a27 _ZN9LAMMPS_NS3Run7commandEiPPc() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/run.cpp:175
13 0x00000000007d3393 _ZN9LAMMPS_NS5Input15command_creatorINS_3RunEEEvPNS_6LAMMPSEiPPc() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/input.cpp:631
14 0x00000000007d19eb _ZN9LAMMPS_NS5Input15execute_commandEv() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/input.cpp:614
15 0x00000000007d2cbc _ZN9LAMMPS_NS5Input4fileEv() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/input.cpp:225
16 0x00000000007e2379 main() /scratch/compile/mukesh.gcc/mukesh/lammps/lammps-9Dec14/src/Obj_ompi_g++/…/main.cpp:31
17 0x0000003c3e21ed1d __libc_start_main() ??:0
18 0x000000000041e3b9 _start() ??:0

two suggestions:

  • run tests with the inputs provided by LAMMPS in the bench and examples directory. if examples like in.lj don’t work, you have a serious problem.
  • if it only happens with your specific input, compile the latest developement version of LAMMPS and test that. since you use an older version of LAMMPS, this may be a problem that has already been addressed.

axel.