"Balance" commands and varying processor allocations by region

Richard_Salaway · August 16, 2012, 8:55pm

Hi,

Thanks for the new “balance” command. It has been very useful so far. I am wondering if I can use it to reach even better optimization for the current structure I am simulating:

Two CNTs form a perpendicular cross-junction, with one resting on top of the other.

The computational cell has dimensions 2L x 2L x 2H

Tube 1 is on the x-axis from x = -L to L, and assume a height of H.

Tube 2 is on the y-axis from y = -L to L, resting on top of Tube 1, and assume a height of 2H.

Due to the largely one-dimensional nature of CNTs (length of tube is much greater than diameter) there is significant empty space in the computation cell, and reaching a reasonable “balance factor” is difficult even with various “balance” command attempts.

The ideal processor configuration would consist of two differing, overlaid grids. The allocation of processors in the x and y directions would change with respect to the z-dimension, to reflect the significant change in physical configuration from Tube 1 at height H, and Tube 2 at height 2H.

Let’s assume I am distributing 1000 processors. An ideal processor allocation would be:

Px, Py, Pz, = 500 x 1 x 1 for z = 0 to H (to cover the “1 dimensional" tube along the x-axis)

Px, Py, Pz, = 1 x 500 x 1 for z = H to 2H (to cover the “1 dimensional” tube along the y-axis)

So I am not sure if this defies the constraints of processor allocation, but I am wondering if some combination of “balance” and other commands can be used to create these two regions of processor allocations, which would greatly reduce the balance factor.

Thanks for your time.

-Rich

akohlmey · August 16, 2012, 9:23pm

hi rich,

Hi,

Thanks for the new "balance" command. It has been very useful so far. I am
wondering if I can use it to reach even better optimization for the current
structure I am simulating:

Two CNTs form a perpendicular cross-junction, with one resting on top of the
other.

The computational cell has dimensions 2L x 2L x 2H

Tube 1 is on the x-axis from x = -L to L, and assume a height of H.

Tube 2 is on the y-axis from y = -L to L, resting on top of Tube 1, and
assume a height of 2H.

Due to the largely one-dimensional nature of CNTs (length of tube is much
greater than diameter) there is significant empty space in the computation
cell, and reaching a reasonable “balance factor” is difficult even with
various “balance” command attempts.

no surprise here. this is a setup where
the scheme how the balance command
changes the 3d grid has to fail.

The ideal processor configuration would consist of two differing, overlaid
grids. The allocation of processors in the x and y directions would change
with respect to the z-dimension, to reflect the significant change in
physical configuration from Tube 1 at height H, and Tube 2 at height 2H.

this kind of thing cannot work. you can have only one domain
decomposition and one constraint is that you may move the
planes decomposing your global box, but you cannot bend them.

Let’s assume I am distributing 1000 processors. An ideal processor
allocation would be:

Px, Py, Pz, = 500 x 1 x 1 for z = 0 to H (to cover the “1 dimensional" tube
along the x-axis)

Px, Py, Pz, = 1 x 500 x 1 for z = H to 2H (to cover the “1 dimensional” tube
along the y-axis)

So I am not sure if this defies the constraints of processor allocation, but
I am wondering if some combination of “balance” and other commands can be
used to create these two regions of processor allocations, which would
greatly reduce the balance factor.

no. you are talking about 1000 processors. how many atoms
do you have in your system, what kind of pair potential are
you using and what is your current limit of scaling?

one option that you are currently dismissing, is the possibility
to include multi-threading as a "fourth dimension" of scaling.
unlike the domain decomposition, multi-threaded styles simply
parallelize over atoms. so for very inhomogenous systems like
yours, there is often a benefit to having larger domains and thus
less domains without any atoms and then using threading for
having a better distribution of work units per processor.

axel.