[lammps-users] performance

Dear lammps users

I want to study the interaction between two wetted rigid particles (pair_style lj/cut/coul/cut). Only the water molecules are allowed to move while the atomistic nano particles are fixed in position. Since I want to observe the formation of water bridges between the particles and avoid interactions of particles with their own images caused by periodic boundary conditions, considerable large parts of the simulation cell are empty (vacuum). I am carrying out simulations on an HPC ( http://www.zid.tuwien.ac.at/zserv/applikationsserver/vienna_scientific_cluster/ ) consisting of modern Intel quad core dual processor machines connected by Infiniband (40Gbps). The performance of the HPC is comparable with the one on the lammps webpage running the systems investigated there. However, if parts of the simulation cell are empty performance strongly degrades. Simulations on 8 cores (one node) need the same time as on 16 cores (2 nodes), while those on 32 cores need even longer times. The communication time rises extremely with every additional node.
I reduced the simulation box now as much as possible, still I cannot use more than 16 cores. Is there any setting in the input file I can choose to improve the performance although there is empty space in the simulation box?

thank you
best regards
Sabine

Dear lammps users

I want to study the interaction between two wetted rigid particles
(pair_style lj/cut/coul/cut). Only the water molecules are allowed to move
while the atomistic nano particles are fixed in position. Since I want to
observe the formation of water bridges between the particles and avoid
interactions of particles with their own images caused by periodic boundary
conditions, considerable large parts of the simulation cell are empty
(vacuum). I am carrying out simulations on an HPC (
TU.it Information Technology Solutions. ZID | TU.it
) consisting of modern Intel quad core dual processor machines connected by
Infiniband (40Gbps). The performance of the HPC is comparable with the one
on the lammps webpage running the systems investigated there. However, if
parts of the simulation cell are empty performance strongly degrades.
Simulations on 8 cores (one node) need the same time as on 16 cores (2
nodes), while those on 32 cores need even longer times. The communication
time rises extremely with every additional node.
I reduced the simulation box now as much as possible, still I cannot use
more than 16 cores. Is there any setting in the input file I can choose to
improve the performance although there is empty space in the simulation box?

this is likely to be a load balancing problem. lammps uses a fixed
domain distribution based on box dimension. if you have large vacuum
areas, then you create a load imbalance.

check out the processors keyword. if you use at most two MPI tasks
in the direction where you have the vacuum, you should get better
load balancing and thus better scaling.

the second option would be to switch to using dual level parallelism
with OpenMP-enabled or GPU-accelerated pair styles. this allows to
keep the domains fairly large and thus have less overhead.

cheers,
   axel.