Stillinger Weber GPU Problem

Hi All,

I am using the Stillinger-Weber three body potential for modelling a system of patchy particles. I am in the testing and
parameterising phase at this point and am trying to move into running simulations on our GPU cluster.

Below are the .in, .dat as well as the quote of the command line input. As you can see, everything is identical except for the necessary changes (newton on [CPU] and newton off [GPU]). NOTE these are test simulations on one set of the building blocks so that I can see what the angular potential is doing - in the future, the simulations will be much larger than that. (the depth of the SW potential is purposefully set very deep so I can analyse the angular dependancy).

""" .in file for both
# initialisation
dimension 3
#newton on
boundary p p p
units lj # decide

# atom-ID molecule-ID atom-type x y z
atom_style molecular

# bring in configuration
read_data init_fin.dat

# potentials
pair_style sw
pair_coeff * * sa.sw A B C D
pair_modify shift yes

# settings
neighbor 2.0 bin
neigh_modify delay 10 check no

group metal type 1 2
group linker type 3 4

thermo 1000

dump atomdump all custom 1 atom.dump id type mol x y z
fix one all rigid/nve/small molecule
fix langDyn all langevin 1.0 1.0 1.0 524361 zero no

# run
timestep 0.003
run 100000 # equilibration
"""

""" .dat for both
LAMMPS data file. CGCMM style. atom_style full generated by VMD/TopoTools v1.5 on Thu Sep 03 14:48:08 ACST 2015
41 atoms
0 bonds
0 angles
0 dihedrals
0 impropers
4 atom types
0 bond types
0 angle types
0 dihedral types
0 improper types
-5.500000 11.500000 xlo xhi
-6.631000 10.369000 ylo yhi
-8.500000 8.500000 zlo zhi

Masses

1 1.000000 # A
2 4.090000 # B
3 1.000000 # C
4 2.750000 # D

Atoms

1 1 1 0 0 0 # A
2 1 2 0.585 -0.585 -0.585 # B
3 1 2 -0.585 0.585 -0.585 # B
4 1 2 -0.585 -0.585 0.585 # B
5 1 2 0.585 0.585 0.585 # B
6 2 1 2 0 0 # A
7 2 2 2.585 -0.585 -0.585 # B
8 2 2 1.415 0.585 -0.585 # B
9 2 2 1.415 -0.585 0.585 # B
10 2 2 2.585 0.585 0.585 # B
11 3 1 4 0 0 # A
12 3 2 4.585 -0.585 -0.585 # B
13 3 2 3.415 0.585 -0.585 # B
14 3 2 3.415 -0.585 0.585 # B
15 3 2 4.585 0.585 0.585 # B
16 4 1 6 0 0 # A
17 4 2 6.585 -0.585 -0.585 # B
18 4 2 5.415 0.585 -0.585 # B
19 4 2 5.415 -0.585 0.585 # B
20 4 2 6.585 0.585 0.585 # B
21 5 3 0 3.146 1.799 # C
22 5 3 0 4.354 1.799 # C
23 5 4 0 3.75 0.899 # D
24 5 4 0 3.75 0 # D
25 5 4 0 3.75 -0.899 # D
26 5 3 0 4.354 -1.799 # C
27 5 3 0 3.146 -1.799 # C
28 6 3 2 3.146 1.799 # C
29 6 3 2 4.354 1.799 # C
30 6 4 2 3.75 0.899 # D
31 6 4 2 3.75 0 # D
32 6 4 2 3.75 -0.899 # D
33 6 3 2 4.354 -1.799 # C
34 6 3 2 3.146 -1.799 # C
35 7 3 4 3.146 1.799 # C
36 7 3 4 4.354 1.799 # C
37 7 4 4 3.75 0.899 # D
38 7 4 4 3.75 0 # D
39 7 4 4 3.75 -0.899 # D
40 7 3 4 4.354 -1.799 # C
41 7 3 4 3.146 -1.799 # C
"""

"""cmd line input
CPU : lmp_openmpi-gpu-spdp-gnu -echo log -in sa.in -log sa.log
GPU : lmp_openmpi-gpu-spdp-gnu -echo log -in sa.in -log sa.log -sf gpu -pk gpu 1
"""
Now we have the .log files for both the CPU and GPU simulations. As you can see there is a significant difference in the initial potential value as well as simulation result. FOR the CPU case, the simulation goes to completion and does as you would expect, irreverisble binding at particular angles. FOR the GPU case and in the latest lammps version (3Sep2015) the simulation blows up and in the version I was previously using (CHECK THIS!!!!!!!) the simulation just "jiggles" in response to the langevin fix, implying the attractive part of the SW potential isn't having an effect.

""" CPU output
LAMMPS (29 Aug 2015)
# initialisation
dimension 3
#newton on
boundary p p p
units lj # decide

# atom-ID molecule-ID atom-type x y z
atom_style molecular

# bring in configuration
read_data init_fin.dat
  orthogonal box = (-5.5 -6.631 -8.5) to (11.5 10.369 8.5)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  41 atoms
  0 = max # of 1-2 neighbors
  0 = max # of 1-3 neighbors
  0 = max # of 1-4 neighbors
  1 = max # of special neighbors

# potentials
pair_style sw
pair_coeff * * sa.sw A B C D
Reading potential file sa.sw with DATE: 2015-08-20
pair_modify shift yes

# settings
neighbor 2.0 bin
neigh_modify delay 10 check no

group metal type 1 2
20 atoms in group metal
group linker type 3 4
21 atoms in group linker

thermo 1000

dump atomdump all custom 1 atom.dump id type mol x y z
fix one all rigid/nve/small molecule
7 rigid bodies with 41 atoms
  1.89769 = max distance from body owner to body atom
fix langDyn all langevin 1.0 1.0 1.0 524361 zero no

# run
timestep 0.003
run 100000 # equilibration
Neighbor list info ...
  1 neighbor list requests
  update every 1 steps, delay 10 steps, check no
  master list distance cutoff = 5.036
  ghost atom cutoff = 5.036
  binsize = 2.518 -> bins = 7 7 7
Memory usage per processor = 11.2005 Mbytes
Step Temp E_pair E_mol TotEng Press
       0 0 1.2308816 0 1.2308816 0
    1000 2.7110716 -9.1944283 0 -7.9050162 0.019694268
    2000 1.5107948 -9.7073514 0 -8.9888026 -0.0079306458
    3000 0.93520057 -9.8241235 0 -9.379333 0.0053456104
    4000 1.0906908 -9.847369 0 -9.3286259 0.0050080574
    5000 1.1364157 -9.8208016 0 -9.2803112 -0.009174185
    6000 1.3357255 -9.6815657 0 -9.0462816 -0.037774577
    7000 1.2984942 -9.8520559 0 -9.2344794 -0.0069013174
    8000 2.3045147 -15.030861 0 -13.934811 -0.020193718
    9000 1.2221497 -15.372095 0 -14.790829 -0.00058408825
   10000 1.0616485 -15.407475 0 -14.902545 0.020892659
   11000 0.91087958 -15.40347 0 -14.970247 -0.00073592008
   12000 1.6335144 -18.698822 0 -17.921906 -0.022794724
   13000 1.2813069 -18.971443 0 -18.362041 -0.021521265
   14000 1.1485899 -18.838424 0 -18.292143 -0.03694827
   15000 2.4514733 -20.440516 0 -19.274572 -0.11190808
   16000 1.0575377 -22.54894 0 -22.045964 -0.014397182
   17000 1.3349861 -22.523331 0 -21.888398 0.024794428
   18000 0.91207475 -22.644379 0 -22.210588 -0.04056296
   19000 1.832487 -22.571856 0 -21.700307 0.011538614
   20000 1.4002912 -22.54827 0 -21.882277 -0.0277438
   21000 0.8506624 -22.731448 0 -22.326864 -0.045125086
   22000 1.3143871 -22.735235 0 -22.110099 -0.00080099796
   23000 1.2311549 -22.596875 0 -22.011326 -0.075441219
   24000 0.78124613 -22.562906 0 -22.191338 0.058578892
   25000 1.4917853 -22.66575 0 -21.956242 -0.02474423
   26000 0.94004427 -22.553644 0 -22.10655 0.02413593
   27000 1.2680626 -22.768888 0 -22.165785 0.0043430591
   28000 0.97811915 -22.55631 0 -22.091107 0.037683225
   29000 1.0014401 -22.652356 0 -22.176061 0.0079897236
   30000 1.2237054 -22.772006 0 -22.19 -0.0016892709
   31000 0.98949037 -22.513278 0 -22.042667 0.0083929002
   32000 1.1603494 -22.565838 0 -22.013964 0.039555464
   33000 1.1909887 -22.671503 0 -22.105057 -0.069068652
   34000 0.83236851 -22.555312 0 -22.159429 -0.037126642
   35000 0.95772698 -22.528168 0 -22.072664 -0.045770645
   36000 1.2307973 -22.708279 0 -22.1229 -0.065491668
   37000 1.1142684 -22.556924 0 -22.026967 0.068989535
   38000 1.4083013 -22.627103 0 -21.957301 -0.051103109
   39000 1.2199974 -22.659522 0 -22.079279 -0.031532029
   40000 1.3242981 -22.62046 0 -21.990611 0.050069896
   41000 1.1811206 -22.739511 0 -22.177758 0.00042972183
   42000 1.0818674 -22.632171 0 -22.117625 0.0075090237
   43000 1.2629219 -22.661793 0 -22.061135 0.0039792648
   44000 1.168607 -22.681943 0 -22.126142 0.032521217
   45000 0.72499782 -22.688676 0 -22.34386 -0.028954744
   46000 0.88785853 -22.535306 0 -22.113032 0.0019003569
   47000 1.0699038 -22.53418 0 -22.025323 0.03273213
   48000 1.1452294 -22.539984 0 -21.995302 -0.023980004
   49000 1.1078029 -22.398678 0 -21.871796 0.017362745
   50000 1.0468605 -22.768557 0 -22.27066 -0.040559963
   51000 1.0407185 -22.520526 0 -22.02555 0.060690501
   52000 1.0158089 -22.740736 0 -22.257607 0.086715631
   53000 1.061659 -22.63546 0 -22.130525 0.012645193
   54000 1.0732175 -22.581552 0 -22.071119 0.021369808
   55000 1.3905587 -22.643967 0 -21.982604 0.048490814
   56000 0.94634984 -22.704445 0 -22.254352 0.038778481
   57000 0.86492862 -22.485103 0 -22.073735 0.0014345469
   58000 1.1885398 -22.66684 0 -22.101559 -0.02201256
   59000 1.2958178 -22.505558 0 -21.889255 0.021320442
   60000 1.0494893 -22.587211 0 -22.088063 -0.022174458
   61000 0.98685257 -22.60712 0 -22.137763 0.052859268
   62000 1.0772396 -22.593076 0 -22.080731 -0.052302949
   63000 1.0437253 -22.500267 0 -22.003861 0.045211045
   64000 1.1681153 -22.695318 0 -22.139751 -0.049976988
   65000 1.2885053 -22.593497 0 -21.980671 -0.035556036
   66000 1.5847052 -22.574424 0 -21.820723 0.020778513
   67000 0.94974968 -22.53164 0 -22.079929 -0.079755295
   68000 1.034079 -22.619378 0 -22.12756 -0.000618855
   69000 0.97753529 -22.610508 0 -22.145582 0.0081138557
   70000 0.82709299 -22.689392 0 -22.296019 0.0081910325
   71000 1.3223447 -22.520436 0 -21.891516 -0.0066960626
   72000 0.70970299 -22.435553 0 -22.098012 0.010927805
   73000 1.3394805 -22.408581 0 -21.771511 0.036480665
   74000 0.91131107 -22.594459 0 -22.161031 -0.08331835
   75000 1.2914389 -22.589922 0 -21.975702 0.12014528
   76000 0.88205743 -22.708161 0 -22.288646 -0.068566226
   77000 1.208503 -22.702439 0 -22.127664 0.011461871
   78000 1.0175151 -22.595499 0 -22.111559 0.0073326127
   79000 1.037866 -22.660008 0 -22.166389 -0.0048219405
   80000 1.1918024 -22.512547 0 -21.945714 -0.0047141458
   81000 1.3607701 -22.556817 0 -21.909621 -0.051729727
   82000 1.0419028 -22.632963 0 -22.137424 0.001864072
   83000 1.0407532 -22.66302 0 -22.168028 -0.026713677
   84000 0.69315048 -22.534816 0 -22.205146 -0.0027664682
   85000 0.84141467 -22.567906 0 -22.167721 0.065548789
   86000 1.2987284 -22.660888 0 -22.0432 0.039629959
   87000 1.1257664 -22.738975 0 -22.203549 0.010828917
   88000 1.1892293 -22.493503 0 -21.927894 0.026608831
   89000 1.9248536 -27.980886 0 -27.065407 0.12840739
   90000 1.1487442 -28.105782 0 -27.559428 -0.005945451
   91000 1.2732826 -28.202637 0 -27.597052 0.015001538
   92000 1.3169628 -28.253376 0 -27.627016 -0.044282212
   93000 0.67037506 -28.188683 0 -27.869846 0.022696448
   94000 0.93366065 -28.123442 0 -27.679384 -0.05507667
   95000 1.2242134 -28.141497 0 -27.559249 -0.0056333751
   96000 3.4616147 -32.534006 0 -30.887628 -0.05682235
   97000 0.815488 -33.683875 0 -33.296021 0.025283309
   98000 1.0503114 -33.74562 0 -33.246081 -0.048108566
   99000 1.6513703 -33.608152 0 -32.822744 -0.0033353658
  100000 1.2193483 -33.671699 0 -33.091765 0.012517318
Loop time of 11.0311 on 1 procs for 100000 steps with 41 atoms

Performance: 2349714.590 tau/day, 9065.257 timesteps/s
98.2% CPU use with 1 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total

Hi Andrew,

can you rebuild the GPU library with the attached source file (lal_sw.cu) and then rebuild LAMMPS with the newly built libgpu.a, to see if the GPU run works fine?

If the problem with the GPU run persists, please send me the force field file sa.sw so that I can reproduce the issue on my side.

Thanks,
-Trung

sw.diff (2.24 KB)

lal_sw.cu (31.3 KB)

Hey Trung,

That fixed the issue!

Thank you very much,

Andrew

Hi Andrew,

glad to hear the updated source file fixed the issue. Thanks for reporting it.

Steve, please include the updated lal_sw.cu in the next patch.
The bugs were made by me when trying to extend
the original implementation of sw/gpu so that it now works for systems with multiple atom types.

Thanks,
-Trung

sw.diff (2.24 KB)

got it - will be in next patch …

thanks,

Steve