[lammps-users] Problem of Parallelization

Hi, all, I met some problem in running Parallel job of a very large system.

I did “minimize” calculation on two same systems with only difference in size. One containing 30000 atoms, while the other contains 700, 000atoms.

I used almost the same input script except that in the larger system I changed the “neigh_modify” to “delay 0 every 1 check yes page 5000000”

Here I added “page 5000000” to the second input script, because the system always informed me that "Neighbor list overflow, boost neigh_modify one or page ".

I used 4 Processors for small system, and it ran successfully.
However, when I used 8 processors for the much larger system, the calculation stopped there giving me following information:

MPI: On host altix9, Program /home/altb/houlongz/minimize/lmp_altix, Rank 0, Process 31859 received signal SIGSEGV(11)

MPI: --------stack traceback-------
Internal Error: Can’t read/write file “/dev/mmtimer”, (errno = 22)
MPI: Intel® Debugger for applications running on IA-64, Version 9.1-28, Build 20070305
MPI: Reading symbolic information from /home/altb/houlongz/minimize/lmp_altix…done
MPI: Attached to process id 31859 …
MPI: stopped at [0xa000000000010641]
MPI: >0 0xa000000000010641
MPI: #1 0x20000000059f7b00 in __waitpid(…) in /lib/tls/libc.so.6.1
MPI: #2 0x20000000000fb710 in MPI_SGI_stacktraceback(…) in /usr/lib/libmpi.so
MPI: #3 0x20000000000fc770 in slave_sig_handler(…) in /usr/lib/libmpi.so
MPI: #4 0xa0000000000107e0
MPI: #5 0x4000000000297680 in _ZN9LAMMPS_NS8Neighbor8full_binEPNS_9NeighListE(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #6 0x400000000028bc40 in _ZN9LAMMPS_NS8Neighbor5buildEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #7 0x400000000026c1a0 in _ZN9LAMMPS_NS5MinCG5setupEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #8 0x400000000026af50 in _ZN9LAMMPS_NS5MinCG3runEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #9 0x400000000026fcd0 in _ZN9LAMMPS_NS8Minimize7commandEiPPc(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #10 0x40000000002502b0 in _ZN9LAMMPS_NS5Input15execute_commandEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #11 0x4000000000256ec0 in _ZN9LAMMPS_NS5Input4fileEv(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #12 0x4000000000265c60 in main(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: #13 0x2000000005915c50 in __libc_start_main(…) in /lib/tls/libc.so.6.1
MPI: #14 0x4000000000004f00 in _start(…) in /home/altb/houlongz/minimize/lmp_altix
MPI: -----stack traceback ends-----
MPI: On host altix9, Program /home/altb/houlongz/minimize/lmp_altix, Rank 0, Process 31859: Dumping core on signal SIGSEGV(11) into directory /home/altb/houlongz/minimize
MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11

I don’t know where the problem is from? Does it have some relationship with the addition of “page 5000000” to the “neigh_modify”?

In addition, I would really appreciate if someone can give me some suggestions on parallell calculation on very large systems (close to one million atoms) using Lammps,
I mean which commands I should pay special attention to. many thanks.

Hi Zhuang,

I don’t think you have a parallelization problem. I think you have misjudged what a reasonable computational load per processor is. If you have correctly described your systems (30 000 and 700 000 atoms), I seriously doubt that 8 processors will be enough for the bigger system. Did you consider the fact that your larger system has more atoms per processor (87 500) than your whole smaller system contains? It doesn’t surprise me that the computer choked. I expect it ran out of memory. Try again with a smaller system or more processors.

As a side note, checking your neighbor list every step is probably overkill and definitely will slow your computation.

Joanne

You shouldn't have to change the neigh_modify page setting when
your system size changes. How many neighbors per atom
are you expecting in your model, due to cutoff, lattice spacing, etc?

Steve