Adding processors and increasing performance

I have a follow-up question regarding the discussion yesterday. For potentials such as Tersoff, shouldn’t I expect to see increased linear performance on multiple processors?
Running 4 processors as opposed to 1 should be 4x faster should it not? I recognize that this depends on the number of atoms in the simulation. Below I have 4 different runs where I ran on 1 processor and 4 processors. I show how much faster running on 4 processors was. I hope I am giving enough information. I am running VMware Ubuntu Desktop and have 4 processors allocated to the virtual machine. I using January version of LAMMPS. I have been talking to other researches (not in MD field) that say that 4 processors should be 4x efficient; however, I am not finding this to be true.

5760 atoms -> 1.5x

46,080 atoms -> 2.34x

92,160 atoms -> 2.58x

184320 atoms -> 2.77x

Ben

I have a follow-up question regarding the discussion yesterday. For
potentials such as Tersoff, shouldn't I expect to see increased linear
performance on multiple processors?
Running 4 processors as opposed to 1 should be 4x faster should it not? I
recognize that this depends on the number of atoms in the simulation. Below
I have 4 different runs where I ran on 1 processor and 4 processors. I show
how much faster running on 4 processors was. I hope I am giving enough
information. I am running VMware Ubuntu Desktop and have 4 processors
allocated to the virtual machine. I using January version of LAMMPS. I have
been talking to other researches (not in MD field) that say that 4
processors should be 4x efficient; however, I am not finding this to be
true.

please do a web search on "strong parallel scaling". that is for
parallel software in general and with reference to MD and also read
the discussion of LAMMPS benchmarks on the LAMMPS homepage. that will
answer a lot of your questions.

since you run virtualization (why??), things are more complex and you
also need to look into processor affinity and what other applications
are using host CPUs and to what degrees. if you have a modern
sandy/ivy bride or haswell CPU things are even more complex and
unpredictable. expecting 4x speedup from 4x the number of parallel
tasks is on today's hardware at best a 0th order approximation.

axel.

Running virtualization since I wanted to run on Linux and my PC is not compatible (at least easily) with dual-booting. So it was easier to just install a virtual box.

If I have allocated 4 processors for sole VMware use, and allocate 2 gigs (which the simulations do not come close to using all of that at this scale), I would think the speed would not vary that much as if I just ran in Windows. Though that must be the case since I am seeing about 2.34x speedup for 4 processors and the doc page shows closer to 4 (at least in the upper 3’s)

Ben

In addition to the searches Axel suggested, have a look at this: http://en.wikipedia.org/wiki/Amdahl’s_law
It’s a nice introductory argument into why going from P -> 4P doesn’t make your code run 4 times faster.

if you are talking about the benchmark page where different potentials
are tested for 32K atoms on 1 and 4 cores, those were all run with
MPI on Linux. On dual hex-cores. Just run straight MPI from the
command line (no virtualization, etc) and I imagine you will do better.

Steve