Also need this.

Stan

Index: KSPACE/pair_lj_cut_coul_long.cpp

Also need this.

Stan

Index: KSPACE/pair_lj_cut_coul_long.cpp

Dear Axel and Stan,

Thank you both for your comments.

What were you expecting? The sum is converging, so the values aren't going to change significantly upon increasing the precision. How are you calculating the absolute error? If you are referring to the output in LAMMPS, that is only an estimate. The only way to get the true absolute error is to compare to a very, very high precision Ewald calculation, and that means increasing *both* the real space cutoff and the kspace precision to ridiculously high values so you know that it is converged.

By increasing the precision by one order of magnitude, I expected to see

the energy difference decrease by roughly the same factor. For example,

E_p-7-E_p-8| should be roughly an order of magnitude smaller than

E_p-6-E_p-7|, etc..., where E_p-X is the energy evaluated to a precision X.

Remember that the error is composed of two parts, real and reciprocal space. In theory you can hold the real space cutoff constant and still increase the accuracy by increasing the kspace precision, but that is only true if you have the correct g_ewald parameter. The kspace error estimator is based on approximations and it isn't perfect, especially if you are going to extremes, so you may also need to set g_ewald manually.

You are right, the error arising from the (neglected) real-space

contribution of neighbouring boxes is something to consider. However, as

you suggested, we can control this error by choosing the combination of

gewald and the cutoff accordingly. I took that into account by adding

another test case.

If you really think the problem is the approximation for erfc, then you could always just try it out the built-in erfc function... (see below)

Thanks for your "patch". I've tried it out and can confirm that the

current approximation of the erfc is indeed the main problem.

To demonstrate this, I've carried out a comparison for two different

cutoff radii (rc=0.5 nm and rc=0.2 nm) with and without your patch. I

only specify the precision and let LAMMPS determine the k-space

settings. Since all pairwise distances are shorter than 0.2 nm, I expect

the real-space error to be smaller for the shorter cutoff radius, since

the employed damping is stronger.

Results I: original 25Feb16 (rc=0.5 nm, precisions from 10^-6 to 10^-14):

-0.17174293258645

-0.17170600041624

-0.1716601235286

-0.17167025873511

-0.17167495005634

-0.17167500295002

-0.17167126200061

-0.1716649212963

-0.17165706647864

Results II: 25Feb16 + "patch" (rc=0.5 nm, precisions from 10^-6 to 10^-14):

-0.17175961009093

-0.17170444738991

-0.17164297840941

-0.17164273119036

-0.17164262884222

-0.17164278325461

-0.17164292059098

-0.17164304382069

-0.17164315141781

Results III: original 25Feb16 (rc=0.2 nm, precisions from 10^-6 to 10^-14):

-0.1724837267265

-0.17183502964335

-0.17167917832447

-0.1716753177966

-0.17167449611321

-0.17167004962641

-0.17166435543163

-0.17165881877719

-0.17165413102677

Results IV: 25Feb16 + "patch" (rc=0.2 nm, precisions from 10^-6 to 10^-16):

-0.17249872637515

-0.17182761924695

-0.17165415807042

-0.17164289466746

-0.17164227165608

-0.17164210060565

-0.17164208901417

-0.17164208839668

-0.17164208826725

-0.17164208827708

-0.17164208828237

Comparing Results III & IV, you can clearly see the effect of the erfc

approximation. Without the approximation the error is O(10^-5), whereas

it is O(10^-11) with the built-in function.

Of course, this is not a rigorous convergence test. However, the results

seem to suggest that in both cases the error introduced by the erfc

approximation is the main contribution to the total error.

Best wishes,

Peter

Well, that is a surprise to me. Can you look at your timings to see how much slower the erfc approximation is? If the difference is small then perhaps we can update the src code. Especially since the default is to use the table anyway.

Stan

Dear Stan,

I wrote a small C++ programme (find erfc.C attached) to compare the

performance of the "exact" evaluation of erfc using the implementation

in cmath (intel libraries) and the single precision approximation

currently employed in LAMMPS.

The programme was compiled with icc, version 14.0.2 20140120. Here are

the results I get for 10^9 evaluations of erfc.

erfc.C (1.85 KB)

Peter,

Nice program. I'll talk to Steve first, but I think we should switch to the built-in erfc.

Thanks,

Stan

Peter,

I ran your test code, and the built-in function is actually faster (see below). But when I run the in.rhodo script in /bench, the built-in function is about 60% slower than the approximation. I'm not sure what the discrepancy is, but the built-in function does slow down LAMMPS significantly. Maybe your simple code is able to vectorize and LAMMPS is not...

g++ erfc.C --std=c++11 -O3

./a.out 100000000 1

Settings: 100000000 approximate evaluations.

It took 2460.64 ms.

./a.out 100000000 0

Settings: 100000000 exact evaluations.

It took 647.219 ms.

Rhodopsin benchmark for 10 timesteps using Makefile.mpi:

tabulation (default): 4.76533 s

approximation for erfc: 9.19292 s

built-in erfc: 14.7215 s

Dear Stan,

Thank you for testing this.

Do you think it would still be worthwhile adding an additional compile

or run-time option for the user to choose between the built-in function

and the approximation?

Best wishes,

Peter

Dear Stan,

Thank you for testing this.

Do you think it would still be worthwhile adding an additional compile

or run-time option for the user to choose between the built-in function

and the approximation?

i think the approximation should be replaced with a double precision

compatible approximation.

this does exist, it will be somewhat slower, but on most machines, it

should still be significantly faster than erfc() from libm.

this is especially true for many versions of libm from glibc on 64-bit

x86 platforms, as present on most linux machines.

for those, it would also be significantly beneficial to replace exp(),

log(), pow(). there is already a small library of such replacement

functions for some special cases in src/math_special.h

if somebody wants to trade precision for speed, tabulation is the way

to go. i think that already at 12-bit or so, its accuracy is mostly

equivalent to the polynomial approximation.

please note that when compiling with intel compilers, usually intel's

own math library will be used instead of libm, which doesn't suffer

from the fp-mode switching problems with glibc's libm.

axel.

Dear Axel and Stan,

Thank you for looking into this. The double-precision approximation

looks quite promising.

Please let me know, if I can be of any help.

Best wishes,

Peter

quick update.

i’ve made a test adapting a pair style running a simple SPC/E water system.

with 12-bit tables i have a run time of ~3.7 seconds

with the math library erfc() function i have a run time of ~11;5 seconds

with the replacement approximation, this gets reduced to ~8.1 seconds.

the major time eater is now the call to exp(), which i think i can speed up a bit, too, with a similar approach.

the single precision polynomial approximation takes about ~6.2. this also uses the exponential, so that could be sped up with a faster (single precision) exponential approximation.

one more comment on stan’s description of hacking in erfc(): for full accuracy, you also need to replace the EWALD_F constant with a double precision version of 2*sqrt(1/pi).

axel.