Hi, all, I met some problem in running Parallel job of a very large system.
I did “minimize” calculation on two same systems with only difference in size. One containing 30000 atoms, while the other contains 700, 000atoms.
I used almost the same input script except that in the larger system I changed the “neigh_modify” to “delay 0 every 1 check yes page 5000000”
Here I added “page 5000000” to the second input script, because the system always informed me that "Neighbor list overflow, boost neigh_modify one or page ".
I used 4 Processors for small system, and it ran successfully.
However, when I used 8 processors for the much larger system, the calculation stopped there giving me following information:
MPI: On host altix9, Program /home/altb/houlongz/minimize/lmp_altix, Rank 0, Process 31859 received signal SIGSEGV(11)
MPI: --------stack traceback-------
Internal Error: Can’t read/write file “/dev/mmtimer”, (errno = 22)
MPI: Intel® Debugger for applications running on IA-64, Version 9.1-28, Build 20070305
MPI: Reading symbolic information from /home/altb/houlongz/minimize/lmp_altix…done
MPI: Attached to process id 31859 …
MPI: stopped at [0xa000000000010641]
MPI: >0 0xa000000000010641
MPI: #1 0x20000000059f7b00 in __waitpid(…) in /lib/tls/libc.so.6.1
MPI: #2 0x20000000000fb710 in MPI_SGI_stacktraceback(…) in /usr/lib/libmpi.so
MPI: #3 0x20000000000fc770 in slave_sig_handler(…) in /usr/lib/libmpi.so
MPI: #4 0xa0000000000107e0
MPI: #5 0x4000000000297680 in _ZN9LAMMPS_NS8Neighbor8full_binEPNS_9NeighListE(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #6 0x400000000028bc40 in _ZN9LAMMPS_NS8Neighbor5buildEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #7 0x400000000026c1a0 in _ZN9LAMMPS_NS5MinCG5setupEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #8 0x400000000026af50 in _ZN9LAMMPS_NS5MinCG3runEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #9 0x400000000026fcd0 in _ZN9LAMMPS_NS8Minimize7commandEiPPc(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #10 0x40000000002502b0 in _ZN9LAMMPS_NS5Input15execute_commandEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #11 0x4000000000256ec0 in _ZN9LAMMPS_NS5Input4fileEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #12 0x4000000000265c60 in main(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #13 0x2000000005915c50 in __libc_start_main(…) in /lib/tls/libc.so.6.1
MPI: #14 0x4000000000004f00 in _start(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: -----stack traceback ends-----
MPI: On host altix9, Program /home/altb/houlongz/minimize/lmp_altix, Rank 0, Process 31859: Dumping core on signal SIGSEGV(11) into directory /home/altb/houlongz/minimize
MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11
I don’t know where the problem is from? Does it have some relationship with the addition of “page 5000000” to the “neigh_modify”?
In addition, I would really appreciate if someone can give me some suggestions on parallell calculation on very large systems (close to one million atoms) using Lammps,
I mean which commands I should pay special attention to. many thanks.