Interface simulation by GPU

Dear Sir,

I am writing to the Lammps mailing list to report a potential bug when GPU is used for interface simulation.

Simulation details:

I have ~1000 propionitrile molecules in an elongated simulation box. The box length is 20 nm and the cross area is ~5*5 nm^2. The pair style lj/cut/coul/long/gpu is used along with kspace style pppm/gpu. The cutoff is 20 A and 14 A for the lj and real space coul interactions. The force precision for pppm is 10^(-6). The run style is respa, the thermostating method is fix nvt (Nose-Hover), and the temperature is 448 K. The force field parameters are taken from TraPPE-UA potential. I am totally aware of all conditions that is needed to properly simulate interfaces. I have attached my input and data file for more details.

I am using one GeForce GTX 660 ti as my GPU unit and the OS is CentOS 6.4. I have installed all GPU drivers as well as Lammps library for GPU. I was able to successfully reproduce the provided examples in GPU folder found in ~/lammps/exapmles/gpu directory.

Problem description:

When /gpu flag is added to the lj/cut/coul/long pair style, the pressure components in x and y directions are a large negative number (about -130 atm). However, liquid density and vapor pressure for my system is consistent with what is published by TraPPE developers. Having a large negative Pxx and Pyy results in a large positive surface tension (~140 mN/m) for propionitrile at T = 448 which is not understandable physically. Vapor pressure is ~8 atm, liquid density is ~0.6 g/cc and surface tension should be ~10 mN/m.

Things that I have tried and “didn’t help”:

  1. Changing the run style to verlet.

  2. Changing the time step to as low as 0.5 fs.

  3. Using Berendsen method instead of Nose-Hover for thermostating.

  4. Changing the number of threads in “gpu force/neigh” command.

  5. Changing the gpu/cpu load in force calculation.

  6. Using the pppm command with no GPU flag.

  7. Using ewald rather than pppm for kspace calculations.

  8. I did also tried a molecule without any partial charge such as Butane. The problem is still there. For butane, lj/cut/gpu is used, of course.

  9. Different temperatures, number of molecules and also longer equilibrium period didn’t help either.

  10. Running on a single core without using mpirun.

Things that I have tried and “did help”:

  1. Removing the gpu flag in lj/cut/coul/long/gpu pair style. Using the exact same data and input file with the lj/cut/coul/long/opt pair style reproduce reasonable values for surface tension as well as vapor pressure and liquid density.

  2. In a bulk fluid simulation every thing turns out to be OK when GPU is used. Energy and pressure values are just fine. Whatever the problem is, only shows itself when an interface is being simulated.

Thank you for the time and effort you put in this.

Ahmadreza F. Ghobadi

Molecular Simulation Lab.

Chemical and BioMolecular Eng. Dep.

The University of Akron. Akron OH

in.1080pron (1.38 KB)

30pron.data (6.17 KB)

there are two essential pieces of information missing here:

- did you compile the gpu library for single precision, mixed
precision or double precision support?
- what version of LAMMPS are you using?

axel.

I have installed the gpu library for “single precision”

I am using Sep. 4 2013 version of lammps.

I have installed the gpu library for "single precision"

well, then you are comparing apples and oranges here. before you can
claim that there is a bug in the GPU code, please first test with full
double precision. the stress tensor accumulation is particularly
sensitive to floating point truncation errors (and thus also choice of
origin), which may explain the discrepancy that you are seeing.

axel.

AFG:

I have not looked at your simulation script in detail but I see a similar problem with a slightly modified gpu/rhodo example (attached). Incorrect output results from using the -sf gpu flag.

If the example multi-line thermo output is replaced with the fix ave/time outputs then the time-averaged pressures in the x and y directions (but not z?) do not track the instantaneous values (stemming from an incorrect pair contribution to the virial).

If the time average “1 2 10” parameters are replaced with “1 1 10” the time average values match the instantaneous values.

Not sure how much this clears up, but the problem may be the time-averaging of the pair contribution to the virial w/ gpu pair calculation.

Mike

in.gpu.rhodo-example-mod (892 Bytes)

AFG:

I have not looked at your simulation script in detail but I see a similar
problem with a slightly modified gpu/rhodo example (attached). Incorrect
output results from using the -sf gpu flag.

you need to be careful with your wording here. you can only claim that
the gpu values are "incorrect" if you compare with GPU code that has
been compiled for double precision. typically that is not the case
(for better performance). so far all *discrepancies* between GPU code
and CPU code have been tracked down to do truncation errors due to
single precision floating point math. so the GPU code is correct, but
the "error" is the error of floating point math itself.

If the example multi-line thermo output is replaced with the fix ave/time
outputs then the time-averaged pressures in the x and y directions (but not
z?) do not track the instantaneous values (stemming from an incorrect pair
contribution to the virial).

If the time average "1 2 10" parameters are replaced with "1 1 10" the time
average values match the instantaneous values.

i don't understand what this should prove. of course the first should
give different results from the second and from the instantaneous,
since in the first case you average over the current time step and the
one before (out thermo output but time step), where in the second case
you only output the current step data. thus either you are not
presenting your case properly or you are chasing a wild goose.

axel.

> AFG:
>
> I have not looked at your simulation script in detail but I see a similar
> problem with a slightly modified gpu/rhodo example (attached). Incorrect
> output results from using the -sf gpu flag.

you need to be careful with your wording here. you can only claim that
the gpu values are "incorrect" if you compare with GPU code that has
been compiled for double precision. typically that is not the case
(for better performance). so far all *discrepancies* between GPU code
and CPU code have been tracked down to do truncation errors due to
single precision floating point math. so the GPU code is correct, but
the "error" is the error of floating point math itself.

> If the example multi-line thermo output is replaced with the fix
ave/time
> outputs then the time-averaged pressures in the x and y directions (but
not
> z?) do not track the instantaneous values (stemming from an incorrect
pair
> contribution to the virial).
>
> If the time average "1 2 10" parameters are replaced with "1 1 10" the
time
> average values match the instantaneous values.

i don't understand what this should prove. of course the first should
give different results from the second and from the instantaneous,
since in the first case you average over the current time step and the
one before (out thermo output but time step), where in the second case
you only output the current step data. thus either you are not
presenting your case properly or you are chasing a wild goose.

The tpress[1-3] are from the input script and represent pxx pyy & pzz from
the fix ave/time command. The 1-2-10 time average data for pxx and pyy
don't have the correct sign. I admit that I have not looked at the code
enough to figure out how the pair contribution to the virial is accumulated
(precision problems?) but I don't think this result makes sense.

I deleted the lx and ly thermo output for column width:

1-1-10 data:
Step Temp PotEng Press Pxx Pyy Pzz tpress[1 tpress[2 tpress[3
       0 301.19057 -2007.0588 9527.9546 8727.6389 9431.037
10425.188 8727.6389 9431.037 10425.188
      10 532.36158 -19359.594 9568.2028 9097.4479 9650.1436
  9957.017 9097.4479 9650.1436 9957.017
      20 525.42517 -21613.464 6663.3473 7064.0017 7305.6451
5620.3952 7064.0017 7305.6451 5620.3952
      30 497.5447 -24438.359 6595.9429 6677.9405 6931.9217
6177.9664 6677.9405 6931.9217 6177.9664
      40 453.43381 -27531.808 4959.8295 4844.3067 5896.2098
4138.9722 4844.3067 5896.2098 4138.9722
      50 401.78597 -30676.329 4552.9046 5174.4512 4808.0724
   3676.19 5174.4512 4808.0724 3676.19
      60 353.56978 -33993.705 2043.1117 2784.583 2417.8875
926.86456 2784.583 2417.8875 926.86456
      70 311.89867 -37288.24 648.30769 1504.5901 566.85942
-126.5265 1504.5901 566.85942 -126.5265
      80 270.92845 -39919.537 682.29261 1301.7442 1776.3196
-1031.186 1301.7442 1776.3196 -1031.186
      90 239.93962 -42437.945 -1510.9166 -1153.919 -1134.4337
-2244.3969 -1153.919 -1134.4337 -2244.3969
     100 214.82488 -44580.405 -1629.6872 -1179.8499 -1023.9389
-2685.273 -1179.8499 -1023.9389 -2685.273
     110 197.01021 -46531.641 -2116.1938 -1699.3278 -1524.9171
-3124.3364 -1699.3278 -1524.9171 -3124.3364
     120 184.15938 -48211.013 -2707.6954 -2185.4452 -2578.6052
-3359.0358 -2185.4452 -2578.6052 -3359.0358
     130 172.91252 -49449.313 -2963.7463 -2311.4656 -2660.1268
-3919.6465 -2311.4656 -2660.1268 -3919.6465
     140 167.60113 -50606.928 -2779.2195 -2047.0694 -2590.7395
-3699.8497 -2047.0694 -2590.7395 -3699.8497
     150 163.66337 -51405.969 -2443.913 -1764.8392 -2613.5438
-2953.3561 -1764.8392 -2613.5438 -2953.3561
     160 163.45485 -52036.777 -2753.1153 -2486.3154 -2447.5772
-3325.4534 -2486.3154 -2447.5772 -3325.4534
     170 165.51977 -52402.341 -2088.9303 -1764.2941 -1806.9342
-2695.5628 -1764.2941 -1806.9342 -2695.5628
     180 170.46727 -52550.549 -1855.2893 -1466.9311 -1696.9482
-2401.9885 -1466.9311 -1696.9482 -2401.9885
     190 177.87878 -52441.638 -1770.1744 -1531.4081 -1398.6007
-2380.5145 -1531.4081 -1398.6007 -2380.5145
     200 190.34055 -52233.461 -1234.4475 -835.026 -913.86506
-1954.4514 -835.026 -913.86506 -1954.4514

1-2-10 data:
Step Temp PotEng Press Pxx Pyy Pzz tpress[1 tpress[2 tpress[3
       0 301.19057 -2007.0588 9527.9546 8727.6389 9431.037
10425.188 0 0 0
      10 532.36151 -19359.608 9568.0477 9097.3725 9650.0716
  9956.699 9339.81 9696.694 10011.555
      20 525.42464 -21613.502 6662.8889 7063.7585 7305.3837
5619.5245 12471.272 12346.741 5651.8936
      30 497.54362 -24438.41 6595.1257 6677.4702 6931.4029
6176.5039 11576.103 11593.152 6101.6324
      40 453.4323 -27531.779 4959.0108 4843.7827 5895.6771
4137.5727 9810.7372 10412.475 4230.7896
      50 401.78445 -30676.265 4552.0397 5173.8621 4807.4848
  3674.772 9305.6981 8561.5001 3567.7778
      60 353.56868 -33993.596 2042.2847 2783.9932 2417.3496
925.51142 6628.4065 5941.1102 1070.3271
      70 311.8973 -37288.033 647.53285 1504.0642 566.27565
-127.74126 5127.0203 3815.4963 -139.17129
      80 270.92766 -39919.323 681.46915 1301.2268 1775.6285
-1032.4479 4499.3972 4557.7101 -1155.7383
      90 239.93905 -42437.655 -1511.673 -1154.5163 -1135.0346
-2245.4683 1976.0677 1514.5069 -2331.8622
     100 214.82261 -44579.952 -1630.4072 -1180.3413 -1024.6065
-2686.2738 1721.4221 1533.9587 -2686.8659
     110 197.01181 -46531.432 -2116.9641 -1700.038 -1525.434
-3125.4203 979.66784 790.06423 -3176.6599
     120 184.16083 -48210.67 -2708.3095 -2185.7603 -2579.0421
-3360.1262 400.17271 -413.43108 -3462.5765
     130 172.90904 -49448.676 -2964.2801 -2312.3359 -2660.1239
-3920.3806 340.70394 -197.60277 -3925.5777
     140 167.60214 -50606.525 -2779.7248 -2047.4865 -2591.2609
-3700.427 605.79814 -273.24384 -3707.1573
     150 163.64549 -51404.046 -2444.0298 -1765.3227 -2613.1225
-2953.6442 936.96131 -205.58628 -2988.1381
     160 163.47105 -52037.231 -2754.1346 -2487.4089 -2447.9321
-3327.0629 310.91049 104.89344 -3286.2316
     170 165.51248 -52400.865 -2088.7476 -1764.7498 -1808.1488
-2693.3443 1208.2622 798.12692 -2661.1257
     180 170.48799 -52551.899 -1856.4423 -1467.3991 -1698.8896
-2403.0382 1550.2476 971.59462 -2400.9494
     190 177.88315 -52440.893 -1769.7312 -1529.6285 -1395.7549
-2383.8101 1634.3877 1491.1504 -2418.6587
     200 190.32551 -52231.236 -1233.4431 -831.10858 -910.87941
-1958.3413 2574.8099 2260.2149 -1929.0234

Mike

I had thought this was a bug with the fix ave/time command that led to bad output, but now I believe it can also influence the integration. Attached is a slightly modified rhodo example input script from lammps-/examples/gpu/in.gpu.rhodo. It requires the data.rhodo file from the lammps-/examples/gpu/ directory. Output below is for a double precision gpu library. 19Sep13 version of lammps.

I run these commands to compare the cpu and gpu with thermo output every 10 timesteps:

mpirun -np 2 ~/nlammps-19Sep13/src/lmp_k20-gpu-double -in in.gpu.rhodo-ex -log log.lammps-gpu-thermo-10 -sf gpu

mpirun -np 12 ~/nlammps-19Sep13/src/lmp_k20-gpu-double -in in.gpu.rhodo-ex -log log.lammps-cpu-thermo-10

Then change to output every timestep:

sed -i “s/thermo 10/thermo 1/g” in.gpu.rhodo-ex
(or equivalent)

I run these commands to compare the cpu and gpu with thermo output every timestep:

mpirun -np 12 ~/nlammps-19Sep13/src/lmp_k20-gpu-double -in in.gpu.rhodo-ex -log log.lammps-cpu-thermo-1

mpirun -np 2 ~/nlammps-19Sep13/src/lmp_k20-gpu-double -in in.gpu.rhodo-ex -log log.lammps-gpu-thermo-1 -sf gpu

grep -A5 Step log.lammps-gpu-thermo-10
Step Temp PotEng Press Pxx Pyy Pzz Lx Ly Lz
0 301.19057 -2007.0107 9583.1664 8799.7666 9488.4849 10461.248 55 77 72.532196
10 468.04409 -14142.869 10714.165 10938.443 10371.003 10833.049 55.012057 77.01688 72.548097
20 537.99401 -19695.018 8993.6332 8498.7907 9013.8984 9468.2105 55.04738 77.066332 72.594679
30 538.54615 -20315.169 7336.3772 7105.6288 7283.6284 7619.8744 55.100884 77.141237 72.665238
40 546.02148 -21510.838 5492.4978 5833.5343 6020.6863 4623.2729 55.163683 77.229156 72.748056

grep -A5 Step log.lammps-cpu-thermo-10
Step Temp PotEng Press Pxx Pyy Pzz Lx Ly Lz
0 301.19057 -2007.1012 9583.1509 8799.7508 9488.469 10461.233 55 77 72.532196
10 468.08929 -14129.881 10853.84 11073.576 10505.223 10982.719 55.006504 77.009105 72.540773
20 538.24185 -19658.939 9345.7926 8831.5852 9347.9675 9857.8252 55.026534 77.037147 72.567188
30 539.2071 -20267.417 7855.3076 7586.6561 7771.723 8207.5438 55.0583 77.081619 72.609079
40 547.02702 -21459.606 6148.4154 6436.2229 6636.2306 5372.7928 55.098347 77.137686 72.661893

grep -A41 Step log.lammps-gpu-thermo-1 | grep “^\s\s*[0-9]*0\s”
0 301.19057 -2007.0107 9583.1664 8799.7666 9488.4849 10461.248 55 77 72.532196
10 468.08923 -14129.806 10853.851 11073.586 10505.234 10982.733 55.006504 77.009105 72.540773
20 538.24179 -19658.885 9345.8033 8831.5958 9347.9786 9857.8356 55.026534 77.037147 72.567188
30 539.20704 -20267.385 7855.3096 7586.6567 7771.721 8207.551 55.0583 77.081619 72.609079
40 547.02697 -21459.576 6148.4206 6436.2284 6636.2357 5372.7978 55.098347 77.137686 72.661893

grep -A41 Step log.lammps-cpu-thermo-1 | grep “^\s\s*[0-9]*0\s”
0 301.19057 -2007.1012 9583.1509 8799.7508 9488.469 10461.233 55 77 72.532196
10 468.08929 -14129.881 10853.84 11073.576 10505.223 10982.719 55.006504 77.009105 72.540773
20 538.24185 -19658.939 9345.7926 8831.5852 9347.9675 9857.8252 55.026534 77.037147 72.567188
30 539.2071 -20267.417 7855.3076 7586.6561 7771.723 8207.5438 55.0583 77.081619 72.609079
40 547.02702 -21459.606 6148.4154 6436.2229 6636.2306 5372.7928 55.098347 77.137686 72.661893

There are cpu / gpu differences at very small order for the thermo 1 output, but the final box sizes are consistent. I would expect the thermo 10 gpu output to have the same final box size, but it does not. Changing the frequency of the thermo output should not change the final answer.

It seems lammps doesn’t know what the pressure is from the gpu pair calculation unless the thermo output is requested for a given timestep. This seems to lead to a problem with the NPT integrator as shown above. It also causes problems when fix ave/time attempts to use values from steps where thermo output is not calculated. I believe this was the cause of the original problem discussed in this thread.

Mike

in.gpu.rhodo-ex (825 Bytes)

Axel,

Sorry for late reply, I was busy with AICHE. I re-built the GPU library in double precision and got the same non-sense pressures (~ -120 atm) in the Pxx and Pyy directions. Other simulation details are the same as my initial post.

I hope this along with what Mike is doing help you to corner the problem.