Thank you for the reply. I think the processor keyword does not help in my case, since the particles are situated in the middle of the simulation box, such that all three directions are effected by the vacuum.
What do you mean with ‘dual level parallelism
with OpenMP-enabled’? I have hardly experience with MPI coding. Is it necessary for this purpose to change code in lammps? Or is it sufficient to add some lines in the qsub script to activate some OMPI intrinsic functions?
Axel Kohlmeyer [email protected] 01/04/11 5:23 PM >>>
Dear lammps users
I want to study the interaction between two wetted rigid particles
(pair_style lj/cut/coul/cut). Only the water molecules are allowed to move
while the atomistic nano particles are fixed in position. Since I want to
observe the formation of water bridges between the particles and avoid
interactions of particles with their own images caused by periodic boundary
conditions, considerable large parts of the simulation cell are empty
(vacuum). I am carrying out simulations on an HPC (
) consisting of modern Intel quad core dual processor machines connected by
Infiniband (40Gbps). The performance of the HPC is comparable with the one
on the lammps webpage running the systems investigated there. However, if
parts of the simulation cell are empty performance strongly degrades.
Simulations on 8 cores (one node) need the same time as on 16 cores (2
nodes), while those on 32 cores need even longer times. The communication
time rises extremely with every additional node.
I reduced the simulation box now as much as possible, still I cannot use
more than 16 cores. Is there any setting in the input file I can choose to
improve the performance although there is empty space in the simulation box?
this is likely to be a load balancing problem. lammps uses a fixed
domain distribution based on box dimension. if you have large vacuum
areas, then you create a load imbalance.
check out the processors keyword. if you use at most two MPI tasks
in the direction where you have the vacuum, you should get better
load balancing and thus better scaling.
the second option would be to switch to using dual level parallelism
with OpenMP-enabled or GPU-accelerated pair styles. this allows to
keep the domains fairly large and thus have less overhead.