[lammps-users] how to speed up computing for highly heterogeneous system?

_Ming_Hu · June 17, 2008, 2:29am

Dear LAMMPS users,

I have a system with solid, liquid and gas phases. The 10 layers in the bottom are solids and above that 90% of the system is occupied by vapor. A small liquid sphere (with radius about 1/8 of vapor dimensions) is in the vapor and in equilibrium with it. The total number of atoms is around 2 million. I used only LJ potential and 1024 nodes to compute the system. I found the computing speed is very low, about 25mins per 10k steps (while for same size pure vapor system, I got 6mins per 10k steps). How could I speed up the computing effeciency for this kind of system with highly heterogeneous density? I think this is resulted by the spatial decomposition of atoms to each processor so most of processors got only several hundred atoms while minor processors got huge number of atoms which severely slowed down the whole speed.

Do you guys have any suggestions to overcome it?
Thanks in advance.

Ming

_Vikas_Varshney2 · June 17, 2008, 1:08pm

Hey Ming,

If your solid layers are in XY plane and the gradient of density is along Z direction, please try resetting (manually using processors command) the grid of 1024 processors as 32 X 32 X 1. Basically, divide the system as elongated slabs along Z direction. This way all the nodes would have similar number of atoms to deal with.

Hope this helps.

Regards,
Vikas

_Ming_Hu · June 19, 2008, 2:46pm

Thanks for your suggestion.
I specified processor grid as 32 X 32 X 2 = 2048 (two CPUs per node). But the speed seems still very low.
What information do I need to check regarding the time consuming in the program?

Regards,
Ming

sjplimp · June 20, 2008, 2:29pm

What information do I need to check regarding the time consuming in the
program?

LAMMPS prints a summary of timings at the end of its run.

Steve

_Ming_Hu · June 21, 2008, 3:42am

Thanks you guys for the suggestions.
I tried to set processor grids as 32 X 64 X 1 and got the speed 20 mins per 10k steps, which is 120% faster than 32 X 32 X 2. Maybe that’s the fastest speed I can get for this highly heterogeneous system.

Best regards,
Ming

_Vikas_Varshney2 · June 23, 2008, 2:24pm

Ming as I told you before, the fastest speed if your system is heterogenous in Z dimension can be acheived if you use “a X b X 1” type of grid since this way. .all a X b processors would have pretty much identical systems to deal with.

Cheers,
Vikas

_Ming_Hu · June 23, 2008, 3:18pm

Hi, Vikas,

Thanks a lot for your suggestion! I tried the processor grid as 32 X 64 X 1, and got the following statistic information for the whole running:

Loop time: 37522.8667371955817 on 2048 procs for 2689154 atoms

Nbond time/: 3779.217183 10.0718 Nay-1 time/: 669.676230 1.7847
Exch time/: 16.757650 0.0447 Comm time/: 514.460397 1.3711
Fcomm time/: 545.570858 1.4540 I/O time/: 323.925549 0.8633
Other time/%: 31673.258870 84.4106

I was wondering what the ‘Other time’ stands for? Why it takes so much percentage of the whole running?
BTW: I was using fortran version of LAMMPS (LAMMPS2001).

Best regards,
Ming

AJING_CAO · June 23, 2008, 3:22pm

Hi Ming,

LAMMPS2001 uses a spatial-decomposition of the simulation domain, but no other load-balancing – thus some geometries or density fluctuations can lead to load imbalance on a parallel machine. That’s praberbaly the problem in your case. check this: http://lammps.sandia.gov/doc/2001/deficiencies.html

Suggest you use the lattest version of Lammps.

AC