ERROR: Balance produced bad splits (balance.cpp:612)

Dear LAMMPS developers and users,

Our research group investigates laser ablation of metals using combined TTM-MD approach. Typical MD configuration is bounded by a box with a high extension in one of directions and the ratio of sides can be up to 1/1000. For instance, the problem in a system of about 44 million atoms arises in the dynamic balancing procedure when the numerical domain is the orthogonal box = (0 0 0) to (162.28 162.28 140003). Calculation was performed on 1800 procs. The error message is

"Imbalance factor: 12.9855

ERROR: Balance produced bad splits (balance.cpp:612)"

As far as we know, this problem has not been discussed yet by the LAMMPS community. Using restart file the problem appears immediately and can be fixed. The restart file is zipped and has the size 2.2 GB, restart file version = 21 Sep 2012.
The file is located here The problem is also seen in the last version 14 Aug 2013.

OS: Scientific Linux SL release 5.5 (Boron)

Kernel: 2.6.32.20 #5 SMP

Arch: x86_64

Sincerely,

Dr. Mikhail Povarnitsyn

Dear LAMMPS developers and users,

Our research group investigates laser ablation of metals using combined
TTM-MD approach. Typical MD configuration is bounded by a box with a high
extension in one of directions and the ratio of sides can be up to 1/1000.
For instance, the problem in a system of about 44 million atoms arises in
the dynamic balancing procedure when the numerical domain is the orthogonal
box = (0 0 0) to (162.28 162.28 140003). Calculation was performed on 1800
procs. The error message is

it is practically impossible to debug such a big calculation.

can you reproduce this error with a (much) smaller system?

do you need dynamic load balancing at all?
do you have such an unpredictable particle distribution?

if i understand you correctly, there is mostly vacuum in z-direction.
wouldn't it then be smarter to use the processors keyword to reduce
the number of processors in z-direction? by default LAMMPS assumes a
homogeneous particle distribution. it may be sufficient to start with
limiting the number of processors in z-direction to a small number
corresponding to the number of particles in z and then adjust this
with just calling balance once?

axel.

Dear Axel,

Thanks for the quick reply.
We need the void ahead of the target. For instance, 1 nanosecond after the laser heating the cluster size distribution is interesting for us.
In Figure below the time-space diagram with the density distribution (projection on z-axis) is presented.
Laser pulse drops from the left on aluminum foil of 2.5 mkm thickness at time = 0, and the pulse duration is 100 femtoseconds only.
The front spallation and evaporation are analyzed. The lateral x-y dimensions of the target are about tens of nanometers while the normal to the target dimension z can be as long as tens of micrometers.
It means that despite the big void fraction in z-direction the material "weight" in this direction is much bigger than that in x and y directions.

For different laser intensities the speed of the droplets and the evaporation rate of the target material will be different and thus the density distribution is "unpredictable", that is why we rely on dynamic balancing and use the following fix:
"fix 2 all balance 10000 xyz 20 1.1 out tmp.balance".
The problem is that the program runs first 400 000 steps and then dies with the message (ERROR: Balance produced bad splits (balance.cpp:612). We could do without dynamic balancing but we see that it indeed significantly speeds up our calculations.
What would be an acceptable file size and number of procs for our bug fixing? Other recommendations will be very helpful either.

Sincerely,
Mikhail

What would be an acceptable file size and number of procs for our bug fixing? Other recommendations will be very helpful either.

The fewer procs and fewer atoms the better. Having the balance file output
of previous splits would also be helpful. Also the input script.
Is the restart file failing on the initial re-balance?

Steve