Cutoff simulation time problem

I am running a script with buckingham potential. I wanted to see the difference between cutoff of 10 and cutoff of 15. I am running on 4 processors with 5760 atoms. Can someone help me understand why the one with higher cutoff is running faster? Is it because I have so few atoms that multiple processors are causing the one with 10A to slow down due to unnecessary communication between processors?

Ben

how much faster?

I am running a script with buckingham potential. I wanted to see the
difference between cutoff of 10 and cutoff of 15. I am running on 4
processors with 5760 atoms. Can someone help me understand why the one with
higher cutoff is running faster? Is it because I have so few atoms that
multiple processors are causing the one with 10A to slow down due to
unnecessary communication between processors?

if you can send me back my crystal ball, i could try to give you an answer.

LAMMPS prints out a summary where it tells you (an estimate) of how
much time is spent where. without seeing your input or your output or
knowing under which circumstances you ran on what machine(s) with what
CPU/OS/etc. it is practically impossible to make any kind of useful
statement. yes, i can imagine a situation where a longer cutoff is
faster in a parallel run, but how should i know that this applies to
you.

we have told you repeatedly that you need to provide concise and
complete information and not just some vague descriptions and guesses,
if you want a proper answer. it is high time that you make an effort
along those lines.

thanks,
     axel.

I just ran the case on one processor:

For Buck with cutoff 10 -> 55 seconds
For Buck with cutoff 15 -> 38 seconds

I am running on VMWARE - Ubuntu Desktop is installed on it.

Output for cutoff of 10:

Loop time of 55.3055 on 1 procs for 100 steps with 5760 atoms

Pair time () = 8.40545 (15.1982) Bond time () = 0.000112772 (0.000203907)
Kspce time () = 20.2998 (36.7048) Neigh time () = 0 (0)
Comm time () = 0.0155919 (0.0281922) Outpt time () = 0.0186851 (0.0337852)
Other time (%) = 26.5659 (48.0348)

FFT time (% of Kspce) = 16.3166 (80.3781)
FFT Gflps 3d (1d only) = 1.27571 2.93539

Nlocal: 5760 ave 5760 max 5760 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 16911 ave 16911 max 16911 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 1.68192e+06 ave 1.68192e+06 max 1.68192e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1681920
Ave neighs/atom = 292
Ave special neighs/atom = 0
Neighbor list builds = 0
Dangerous builds = 0

Output for cutoff of 15:

Loop time of 38.9796 on 1 procs for 100 steps with 5760 atoms

Pair time () = 27.717 (71.1063) Bond time () = 0.000117779 (0.000302155)
Kspce time () = 4.60485 (11.8135) Neigh time () = 0 (0)
Comm time () = 0.0249262 (0.0639467) Outpt time () = 0.0183799 (0.0471526)
Other time (%) = 6.61439 (16.9688)

FFT time (% of Kspce) = 3.20442 (69.5879)
FFT Gflps 3d (1d only) = 1.45782 2.80583

Nlocal: 5760 ave 5760 max 5760 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 29440 ave 29440 max 29440 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 4.71168e+06 ave 4.71168e+06 max 4.71168e+06 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 4711680
Ave neighs/atom = 818
Ave special neighs/atom = 0
Neighbor list builds = 0
Dangerous builds = 0

Script is below:

units metal
atom_style full
kspace_style pppm 1.0e-6

#Data format is read in below
read_data …/…/quartz/quartz.data

pair_style buck/coul/long 10
pair_coeff 1 1 1388.7730 0.36231884058 175.0000
pair_coeff 1 2 18003.7572 0.205204814926 133.5381
pair_coeff 2 2 0 .1 0

timestep 0.001

fix 1 all box/relax x 1.01325 y 1.01325 z 1.01325 vmax 0.001
minimize 1.0e-15 1.0e-15 100000 1000000
unfix 1

replicate {a} {b} ${c}

velocity all create 600 23423
fix 1 all npt temp 300 300 0.05 x 1.01325 1.01325 5 y 1.01325 1.01325 5 z 1.01325 1.01325 5
run 100
unfix 1

I am running a script with buckingham potential. I wanted to see the
difference between cutoff of 10 and cutoff of 15. I am running on 4

4 or 1? You said 4 and yet you posted results for 1.

processors with 5760 atoms. Can someone help me understand why the one with
higher cutoff is running faster? Is it because I have so few atoms that
multiple processors are causing the one with 10A to slow down due to
unnecessary communication between processors?

Look at your Comm timings. Do cutoff 10 and 15 differ a lot? If not,
then it is not communication.

Look at the other timings - which one differs the most? Then think
about the kspace style you used.

Ray

I posted results for 1 because I was eliminating the possibility of just running on more processors was causing it. I just ran it on one processor and got similar results. The pair time is the main contributer and is much higher for cutoff of 10.

Regarding kspace_style, I dropped accuracy from 1.0e-6 to e-4 and found cutoff of 10 to run faster. So perhaps with a cutoff of 15A, the accuracy asked for is achiever faster?

Ben

I posted results for 1 because I was eliminating the possibility of just
running on more processors was causing it. I just ran it on one processor
and got similar results. The pair time is the main contributer and is much
higher for cutoff of 10.

Regarding kspace_style, I dropped accuracy from 1.0e-6 to e-4 and found
cutoff of 10 to run faster. So perhaps with a cutoff of 15A, the accuracy
asked for is achiever faster?

<sigh>please spend some time on understanding what changes in what is
computed if you change the kspace accuracy before you speculate
(wrongly).</sigh>

looking at the timing breakdown already tells the story of what is going on.

when running with long-range electrostatics you don't have one
"cutoff" to consider but two: the one for real space electrostatics
and the one for reciprocal space electrostatics. the second is implied
and estimated based on the energy convergence for kspace. with that in
mind, for a given accuracy you automatically change the second, if you
change the first. that will determine how much of the computation is
done in real space and how much in reciprocal space. now the real
space and reciprocal space calculations have different scaling limits
( O(N**2) and O(N*log(N)) ) and different prefactors and lower order
terms that determine how much time it takes to sum each part up to
result in the desired accuracy. because of that, there is an optimal
value for the distribution, which is likely between 10 and 15
angstrom. as a rule of the thumb it tends to be around having about
20% of the total time spent in kspace.

to be perfectly clear on this. the accuracy for the coulomb part is
theoretically essentially from the real-space cutoff, since you
effectively compute *all* coulomb up to infinity through combining
Pair and Kspace. always. the only difference you have is in the
non-coulomb terms, that have no equivalent representation in Kspace.

of course, reducing the kspace accuracy term will reduce the amount of
work done in kspace through changing the reciprocal space cutoff.

axel.

So, just trying to understand what are you saying.

You are basically saying that when I use 10 Angstroms as the cutoff, more of the computation is being done in real-space as opposed to reciprocal space. And changing to 15A for a cutoff allows more computation to take place in reciprocal space. So changing the cutoff will affect how much computation is going on where. Is there a relationship that describes this? Eventually, as cutoff continues to increase, the time will take longer. So I am not sure why a number between 10 and 15 is the most optimal (though I agree based on performing my own calculations).

You already did a great job of explaining a lot of the mechanics that I did not understand. Could you help me understand why decreasing the cutoff to 10A will result on more computation in real space as opposed to reciprocal space?

Ben

So, just trying to understand what are you saying.
You are basically saying that when I use 10 Angstroms as the cutoff, more of
the computation is being done in real-space as opposed to reciprocal space.
And changing to 15A for a cutoff allows more computation to take place in
reciprocal space. So changing the cutoff will affect how much computation is
going on where. Is there a relationship that describes this? Eventually, as
cutoff continues to increase, the time will take longer. So I am not sure
why a number between 10 and 15 is the most optimal (though I agree based on
performing my own calculations).

i already explained this. just re-read my previous e-mail.

You already did a great job of explaining a lot of the mechanics that I did
not understand. Could you help me understand why decreasing the cutoff to
10A will result on more computation in real space as opposed to reciprocal
space?

it doesn't. it is the other way around, as it clearly visible in the
timing results.

now all you have to do is the reasonable thing to do in this case and
that is to pick up a text book about MD that properly explains how
ewald summation works (pppm is just a gimmick to approximate the ewald
summation using a grid and fourier transforms, the basic idea is the
same) and then think about it for a bit instead of writing another
e-mail and *then* come back if you still have questions.

axel.

Axel, I feel like the emails are conflicting though. Real-space computation takes longer than reciprocal space. (N^2) as opposed to (NlogN). cutoff of 10 is taking longer, meaning I would think more computation is taking place in real-space.

Ben

Axel, I feel like the emails are conflicting though. Real-space computation
takes longer than reciprocal space. (N^2) as opposed to (NlogN). cutoff of
10 is taking longer, meaning I would think more computation is taking place
in real-space.

they are not. look at the facts and figure it out. it is all there,
but you are not thinking things through, not paying attention to the
details, and thus are drawing the wrong conclusions.

real-space with a cutoff is not N^2, its O(N).

As you increase the cutoff pair time goes up, kspace-time
goes down. What the optimal choice of cutoff is, (if run time
is the only thing you care about), is system and machine and
proc count dependent). There is no magic answer - you have
to do some runs and figure out the optimal choice empirically.

Note that for a Buckingham potnetial there is a non-Coulombic
part, so if you change the cutoff you are also changing your model.

Steve

real-space with a cutoff is not N^2, its O(N).

sorry, that was my sloppy formulation.

what i meant to say was that the cost of computing for real-space
grows with the *cutoff* to a higher order and that is actually of
cubic and not quadratic complexity. linear scaling applies to system
size, which was not topic of the discussion.

As you increase the cutoff pair time goes up, kspace-time
goes down. What the optimal choice of cutoff is, (if run time
is the only thing you care about), is system and machine and
proc count dependent). There is no magic answer - you have
to do some runs and figure out the optimal choice empirically.

...and we also should not forget to mention paul's fix tune/kspace.
http://lammps.sandia.gov/doc/fix_tune_kspace.html
that can automate parts of the empirical optimization process.

axel.

Oh, ok If it is O(N) and not N^2 then I understand it now. Sorry, I thought that O(N**2) meant that it was squared.

Ben