[lammps-users] Improve parallel effeciency

thank you all for your reply, actually the system is empty along z direction in the upper part of the box, and running on some INTEL CPUs. The empty space must be preserved because in some other related cases it will be filled, and I usually cut(assign processors) the system in Z direction which I find can produce results without numberial errors for my specific system, this cause that some of the cpus even have no atom to run. If there is a technic that can assign regions with different sizes to different processors, that will be very helpful, thanks!

dear xu,

thank you all for your reply, actually the system is empty along z
direction in the upper part of the box, and running on some INTEL
CPUs. The empty space must be preserved because in some other related
cases it will be filled, and I usually cut(assign processors) the
system in Z direction which I find can produce results without
numberial errors for my specific system, this cause that some of the
cpus even have no atom to run. If there is a technic that can assign
regions with different sizes to different processors, that will be
very helpful, thanks!

this sounds like you should try using the processors keyword
with either

processors * * 1

or

processors * * 2

in the second case you have to make sure that the "dense" area
of your system is symmetric around the middle of you simulation
box. with restricting the number of processors to be assigned
in z direction you make sure that each processor gets about an
equal amount of "local" atoms, i.e. work to do.

it is really difficult to make more specific suggestions
without knowing more about the details of your system.
can't you produce a small test input and post it? that
would make it possible experiment with the parameters
and tune it for optimal performance. LAMMPS does have
some hooks that can be adjusted and have potential for
significant speedups.

a second option to improve performance would be to experiment
with the OpenMP/MPI hybrid parallelization that our group
has been working on recently. we are still in the process of
properly validating all converted potentials, but current
benchmarks indicate that there will be at the very least
a 2x performance gain at high node counts across the board.
depending on the properties and potentials use in the system
and in combination with other adjustments of input files, we
have seen over 10x speedup in one specific case.

see the attached chart as a simple example.

cheers,
   axel.

openmp-mpi-hybrid-speedup.pdf (30.4 KB)