[lammps-users] Antw: Re: performance

Dear Axel

Thank you very much again for your help and the calculations.

I still have some questions:

a) You wrote about a pair style lj/cut/coul/cut/omp but I could not find it on the documentation page. Is it already released, if yes in which lammps version ( currently I use 15Dec2010 version)?

b) Concerning boundary conditions: I already tried before to use f f f, but then I had problems with lost atoms caused by evaporation water molecules.
I dont understand clearly in the documentation what happens using s or m. The box dimension is allowed to fluctuate such that all the atoms are inside the original box? Is the box expected to become very large slowing down the calculations when the atoms drift away? What consequences do this styles have?

Best regards

Sabine

Axel Kohlmeyer [email protected] 10.01.11 19.35 Uhr >>>
dear sabine,

Dear Axel,

Thank you for offering support.
I sent you a testcase (data+input) of two wetted rigid particles of radius 2
nm com situated at (150/150/105) and (150/150/147) in the simulation box.
Calculations using 8 cores needed around 13sec/100steps those using 16cores
around 11.5 secs/100 steps increase to 24 cores further degrades the
performance.

Looking at the input and data files, could you give me a hint how I should
choose the parameters for the processor keyword?

there are a number of ways, how your input can be improved.

a) do you really need the thermodynamic data in every step?
using “thermo 10” for example, makes the calculation signficantly faster.
b) in many cases when you have a “cluster configuration”, it helps to
shift the cluster in to center of the box. that would also allow you to
turn off periodic boundary conditions, which - given the cutoffs that
you are using - seem to be leading to spurious results in any case.
c) even without using threading, lj/cut/coul/cut/omp is faster than
lj/cut/coul/cut
due to some refactoring of the code
d) there is only a small benefit from threading in this specific
situation, if at all.
how much depends a lot on the hardware you have.

the biggest problem that stands in your way to get better performance is
is the amount of communication, presumably due to SHAKE. that part usually
does not parallelize so well, and thus there is little that can be
done about it.

however, i was able to run your system as fast as this using my modified input.

Loop time of 3.06568 on 256 procs (256 MPI x 1 OpenMP) for 100 steps
with 32994 atoms
Performance: 5.637 ns/day 4.258 hours/ns 32.619 timesteps/s

Pair time () = 0.287554 (9.37978) Bond time () = 8.65683e-05 (0.00282379)
Neigh time () = 0.0913223 (2.97886) Comm time () = 1.90087 (62.0049)
Outpt time () = 0.17763 (5.79416) Other time () = 0.608214 (19.8395)

this is a performance increase until a count of about 130 atoms per
processor core.
which is actually a very good result. most classical MD codes scale out at
a few thousand atoms/processor.

and with the unmodified input i get this performance.

Loop time of 4.27749 on 256 procs (256 MPI x 1 OpenMP) for 100 steps
with 32994 atoms
Performance: 4.040 ns/day 5.941 hours/ns 23.378 timesteps/s

Pair time () = 0.407228 (9.52027) Bond time () = 0.000103446 (0.00241838)
Neigh time () = 0.0894838 (2.09197) Comm time () = 2.3135 (54.0856)
Outpt time () = 0.770148 (18.0047) Other time () = 0.697021 (16.2951)

for your reference, the differences between the two inputs for just
one processor:

Loop time of 95.3823 on 1 procs (1 MPI x 1 OpenMP) for 100 steps with
32994 atoms
Performance: 0.181 ns/day 132.475 hours/ns 1.048 timesteps/s

Pair time () = 70.9656 (74.4013) Bond time () = 0.000225544 (0.000236463)
Neigh time () = 23.6739 (24.82) Comm time () = 0.0349481 (0.03664)
Outpt time () = 0.101207 (0.106107) Other time () = 0.606374 (0.63573)

Loop time of 126.526 on 1 procs (1 MPI x 1 OpenMP) for 100 steps with
32994 atoms
Performance: 0.137 ns/day 175.731 hours/ns 0.790 timesteps/s

Pair time () = 102.148 (80.7326) Bond time () = 0.000319481 (0.000252501)
Neigh time () = 23.5723 (18.6304) Comm time () = 0.0348785 (0.0275662)
Outpt time () = 0.121385 (0.0959368) Other time () = 0.649399 (0.513252)

as you can see, the /omp refactoring and less frequent energy output
reduced the pair time to 70% of the original and for a single processor
run, communication doesn’t matter. for more processors, it dominates
the total time.

axel.

Dear Axel

Thank you very much again for your help and the calculations.

I still have some questions:

a) You wrote about a pair style lj/cut/coul/cut/omp but I could not find it
on the documentation page. Is it already released, if yes in which lammps
version ( currently I use 15Dec2010 version)?

this is part of LAMMPS-ICMS
http://sites.google.com/site/akohlmey/software/lammps-icms

this style works with and without openmp, btw.

b) Concerning boundary conditions: I already tried before to use f f f, but
then I had problems with lost atoms caused by evaporation water molecules.

that would indeed be a problem, but you can either choose to ignore them
or you could use reflecting walls. the question really is what would be the
best model to represent the physics of what you are looking at. if you do want
periodicity, you should use kspace, too. if you don't, you have to make sure
that you don't create spurious effects by either too short a coulomb cutoff
or - in case you use periodicity - having the box too small so that images
do interact.

I dont understand clearly in the documentation what happens using s or m.
The box dimension is allowed to fluctuate such that all the atoms are inside
the original box? Is the box expected to become very large slowing down the

with "s" the box is increased and shrunken as needed to fill all atoms,
with "m" it is the same, but there is a minimum size.

calculations when the atoms drift away? What consequences do this styles
have?

an increasing box size through atoms drifting away, will affect the performance,
since on every reneighboring step, the domains will be divided evenly across
space and thus the load imbalance will increase. periodic boundaries or fixed
boundaries with reflecting (or soft) walls will avoid this.

axel.