Performance Variability in LAMMPS

Hello,

I am looking into performance variability of the LAMMPS infrastructure. I have a question. Does anyone has noticed this variability? If yes, how extreme and under what conditions? I recently tried 500 parallel LAMMPS instances using 320 cores (for each instance) with melt data on Blue Waters. I didn’t notice much performance variability.

Regards,

Hello,

I am looking into performance variability of the LAMMPS infrastructure.

​what exactly do you mean by "performance variability"?​

I have a question. Does anyone has noticed this variability? If yes, how
extreme and under what conditions? I recently tried 500 parallel LAMMPS
instances using 320 cores (for each instance) with melt data on Blue
Waters. I didn't notice much performance variability.

when running "melt" across 320 cores, ​you didn't ​do a very realistic run,
because at 100 atoms per core, you are way past the strong scaling limit of
LAMMPS for such a "fast" pair style. so you are actually not so much
benchmarking the "physics part" of the code, but the "communication and
infrastructure", which should have less impact on realistic setups.

why should there be any variability in the first place? there are no random
elements involved than change, if you keep running with the same settings
across the same number of nodes. the computations and results should be
identical. you would have to actively insert a random element (e.g.
different seeds for the pRNG in the velocity command) to decorrelated
trajectories, but you'd have to run for many more steps to actually have
statistically independent configurations. if you see any difference, they
are more likely due to the hosting hardware and networking and how well any
single job is separated from other jobs on the same machine. for a very
high end box, like blue waters, this should be minimal (or else you
couldn't get good strong scaling for big projects across the entire
machine).

axel.

Hello,

I am looking into performance variability of the LAMMPS infrastructure.

​what exactly do you mean by "performance variability"?​

​Execution time variation.

I have a question. Does anyone has noticed this variability? If yes, how
extreme and under what conditions? I recently tried 500 parallel LAMMPS
instances using 320 cores (for each instance) with melt data on Blue
Waters. I didn't notice much performance variability.

when running "melt" across 320 cores, ​you didn't ​do a very realistic
run, because at 100 atoms per core, you are way past the strong scaling
limit of LAMMPS for such a "fast" pair style. so you are actually not so
much benchmarking the "physics part" of the code, but the "communication
and infrastructure", which should have less impact on realistic setups.

​I am not an expert on LAMMPS. I am doing experiments to find when the
resources/hardware infrastructure become the source of performance
bottlenecks and performance variation. Just to elaborate, If I launch N
LAMMPS instancces, some instances might show great variation​ in execution
time. Also, total exection time for each batch of job might vary.

why should there be any variability in the first place? there are no
random elements involved than change, if you keep running with the same
settings across the same number of nodes. the computations and results
should be identical. you would have to actively insert a random element
(e.g. different seeds for the pRNG in the velocity command) to decorrelated
trajectories, but you'd have to run for many more steps to actually have
statistically independent configurations.

if you see any difference, they are more likely due to the hosting
hardware and networking and how well any single job is separated from other
jobs on the same machine. for a very high end box, like blue waters, this
should be minimal (or else you couldn't get good strong scaling for big
projects across the entire machine).

​This is what I am looking for. At what stage this become major source of
variation​