Ewald summation is much faster in the newer versions

Aris_Sgouros · May 27, 2015, 7:57am

Dear lammps users.

I am performing MD simulations on a polyethele slab confined by two graphite walls, and i am calculating the long range interactions with ewald summations.

Here are my setings:

pair_style lj/long/coul/long long off 9.085
pair_modify mix arithmetic
kspace_style pppm/disp 0.00001
kspace_modify slab 3.0 gewald/disp 0.28

In the past i used the 1Nov13 version of lammps and i recently updated to the new version (10Feb15).

Surprisingly the simulations performed in the new version are performed ~8x faster in comparison with the older one!

Im not saying its a bad thing but if possibly i would like to know why this happens.

Currently im doing some test runs with both versions.

The thermo output (ebond, elong, ke, ect) at the first step of the simulation is identical in both versions for the exact same configuration and random seed of velocities, thus the energetics and velocities are calculated properly in both versions.
However the thermo output and the trajectories start to deviate in the following steps though that could be normal.

Thank you,
Aris Sgouros

Rolf_Erwin_Isele-Hol · May 27, 2015, 8:06am

Dear Aris,

there have been quite some changes to the pppm/disp, so
it is hard to tell where the speed-up comes from. Can
you please send LAMMPS log files from both simulations?
This will probably help to find the answer to your question.

Best, Rolf

Rolf_Erwin_Isele-Hol · May 27, 2015, 8:13am

Dear Aris,

there have been many changes in pppm/disp between the
two versions that you use. Can you please provide LAMMPS
logfiles from both of your simulations? This will help to answer
your question.

Best, Rolf

Aris_Sgouros · May 27, 2015, 8:49am

Dear Isele and the rest lammps users,

thanks a lot for the quick reply!

Im attaching the logs for both versions up to 100,000 time steps.

Note that both inputs were identical with the exception:

(lammps 1 Nov 2013)
compute cpeWall all stress/atom
(lammps 10 Feb 2015)
compute cpeWall all stress/atom NULL

Log files:

LAMMPS (1 Nov 2013)
WARNING: Mixing forced for lj coefficients (…/pair_lj_long_coul_long.cpp:88)
Scanning data file …
1 = max bonds/atom
1 = max angles/atom
1 = max dihedrals/atom
Reading data file …
orthogonal box = (0 0 0) to (71.34 71.34 71.34)
2 by 2 by 4 MPI processor grid
11400 atoms
11286 bonds
11172 angles
11058 dihedrals
Finding 1-2 1-3 1-4 neighbors …
2 = max # of 1-2 neighbors
2 = max # of 1-3 neighbors
4 = max # of 1-4 neighbors
6 = max # of special neighbors
11400 atoms in group mobile
PPPMDisp initialization …
WARNING: Charges are set, but coulombic solver is not used (…/pppm_disp.cpp:284)
Dispersion G vector (1/distance)= 0.28
Dispersion grid = 24 24 75
Dispersion stencil order = 5
Dispersion estimated absolute RMS force accuracy = 0.00257641
Dispersion estimated relative force accuracy = 7.75879e-06
using double precision FFTs
3d grid and FFT values/proc dispersion = 17051 8208
Setting up run …
Memory usage per processor = 11.8161 Mbytes
Step Temp KinEng PotEng E_bond E_angle E_dihed E_vdwl E_long
0 450 15290.214 9775.8564 4987.012 8695.4691 8883.9216 -806.64944 -9766.4487
10000 451.24674 15332.576 6211.3555 4918.8688 5091.5507 9040.4699 -924.18191 -9755.8824
20000 450.16214 15295.723 6227.2353 4964.1433 5050.6228 9088.4736 -972.95874 -9756.8679
30000 446.66885 15177.028 6376.9248 5105.2899 5133.9371 8979.2937 -967.78218 -9745.8209
40000 447.13615 15192.905 6209.1239 4964.0213 4948.2044 9125.036 -931.6129 -9739.1278
50000 450.43173 15304.884 6248.2554 5092.8649 5083.1915 8914.5638 -958.27079 -9756.7917
60000 449.43398 15270.982 6154.7301 5037.4742 4991.7693 8884.4272 -907.68321 -9759.6189
70000 445.07292 15122.801 6031.1735 5009.0292 4860.1582 9032.231 -996.24639 -9750.0422
80000 451.03788 15325.48 6255.4394 5053.0414 4972.4993 8988.5425 -900.24649 -9763.4947
90000 447.18423 15194.539 6312.8685 5001.8127 4969.3465 9108.9125 -878.42156 -9748.0257
100000 449.55362 15275.047 6269.1045 5083.9992 4967.0348 9049.2794 -929.45529 -9747.0126

LAMMPS (10 Feb 2015)
Reading data file …
orthogonal box = (0 0 0) to (71.34 71.34 71.34)
2 by 2 by 4 MPI processor grid
reading atoms …
11400 atoms
scanning bonds …
1 = max bonds/atom
scanning angles …
1 = max angles/atom
scanning dihedrals …
1 = max dihedrals/atom
reading bonds …
11286 bonds
reading angles …
11172 angles
reading dihedrals …
11058 dihedrals
Finding 1-2 1-3 1-4 neighbors …
2 = max # of 1-2 neighbors
2 = max # of 1-3 neighbors
4 = max # of 1-4 neighbors
6 = max # of special neighbors
11400 atoms in group mobile
PPPMDisp initialization …
WARNING: Charges are set, but coulombic solver is not used (…/pppm_disp.cpp:287)
Optimizing splitting of Dispersion coefficients
Using geometric mixing for reciprocal space
Dispersion G vector (1/distance)= 0.28
Dispersion grid = 24 24 75
Dispersion stencil order = 5
Dispersion estimated absolute RMS force accuracy = 0.00257641
Dispersion estimated absolute real space RMS force accuracy = 0.000359341
Dispersion estimated absolute kspace RMS force accuracy = 0.00255123
Dispersion estimated relative force accuracy = 7.75879e-06
using double precision FFTs
3d grid and FFT values/proc dispersion = 17051 8208
Neighbor list info …
1 neighbor list requests
update every 1 steps, delay 10 steps, check yes
master list distance cutoff = 11.085
Setting up run …
Memory usage per processor = 13.8919 Mbytes
Step Temp KinEng PotEng E_bond E_angle E_dihed E_vdwl E_long
0 450 15290.214 9775.8564 4987.012 8695.4691 8883.9216 -806.64944 -9766.4487
10000 453.42 15406.42 6291.9248 4992.9374 5076.4334 9080.873 -973.44642 -9743.2964
20000 450.34991 15302.104 6148.6139 5034.714 4992.3628 9028.4682 -1010.9315 -9738.9365
30000 453.75619 15417.843 6368.6969 5109.2609 4979.0011 9034.4978 -905.12563 -9754.2376
40000 449.68414 15279.482 6219.4403 5130.557 4947.5158 9024.119 -1031.852 -9754.3673
50000 453.09004 15395.208 5968.3884 5022.1993 4938.6168 8832.9661 -930.67787 -9755.0031
60000 448.2621 15231.163 6272.67 5101.9236 4895.3186 9048.1184 -900.08356 -9759.7523
70000 451.82646 15352.274 6026.2791 5015.7375 5018.9754 8842.3171 -1003.7041 -9765.4213
80000 444.81612 15114.075 6352.4327 5017.8085 5130.3529 9031.1848 -947.31591 -9760.2325
90000 451.11329 15328.042 6368.6327 5046.5208 5100.7524 9020.0477 -961.54051 -9750.6403
100000 451.17238 15330.05 6172.3152 5028.9702 4827.118 9084.7992 -889.96278 -9750.7212

Thank you,
Aris Sgouros

Rolf_Erwin_Isele-Hol · May 27, 2015, 8:52am

Hi,

I think I know where the speed-up comes from.
Can you please provide your lammps input
script and lammps data file?

Best, Rolf

Aris_Sgouros · May 27, 2015, 9:03am

Sure,

Since the data file is really really big im attaching some dropbox links if thats ok.

https://www.dropbox.com/s/9ryajlaswue1uch/1Nov13_pe.m114a100.in?dl=1

https://www.dropbox.com/s/9hy5hpi5xxf4n2g/10Feb15_pe.m114a100.in?dl=1

https://www.dropbox.com/s/wcpoeoe5cuk4go4/POS.M114.A100.data?dl=1

Thanks,
Best, Aris

Rolf_Erwin_Isele-Hol · May 27, 2015, 9:12am

the speed-up comes from that you use the arithmetic mixing
rule but only have one atom type.

The old procedure when using arithmetic mixing is to
split the dispersion coefficients into 7 terms and then
perform reciprocal space computations 7 times.
The new LAMMPS versions checks for the rank of
the dispersion coefficient matrix and
then checks whether computations can be performed
faster. In your case the rank is 1, so it is possible to split
the dispersion coefficients into 1 term (i.e. no splitting)

You can read more about this in the appendix of this paper:

Best, Rolf

Aris_Sgouros · May 27, 2015, 9:19am

Ok i guess that makes sense
ill read that paper to understand the issue more thoughfully
Thanks a lot!
Aris