Issue with GPU acceleration for simulating droplet formation

Hi everyone,

I am trying to simulate the droplet formation of a liquid on an atomistic surface. To do this, I expanded the box simulation in order to represent the gas phase, allowing the droplet formation. The system is composed by 108.864 atoms, 77760 of them correspond to water molecular and the other ones to the surface.

To improve the calculation performance I used “fix balance” command in order to optimize the proccesors distribution as shown following:

image.png

Recently, I am trying to accelerate my calculation using GPU package, without improve the calculation performance (Tesla K80). I am using LAMMPS (22-Aug-2018), which has not the “fix balance” command avalaible to accelerate pppm/gpu. I have found out that the calculation is significantly accelerated if the kspace is turn off.

I am very greatful for any advice for this issue.

fix balance is available in LAMMPS since 2012.

i am not surprised to see, that turning off kspace will speed things up. that is quite normal, especially for systems with large amounts of vacuum, since the cost of a grid based method depends both, on the volume and on the number of atoms in the system.

axel.

image.png

Thank you Axel,

fix balance is available in LAMMPS since 2012.

I have used “fix balance” for calculation with CPUs, however it is not available for accelerating pppm/ gpu. I used “package gpu 1 split -1.0” in order to load-balance across the CPU and GPU. When I used split = 1 the simulation does not work with ppp/gpu (as it was posted before here), I change it for ewald/gpu, the simulation run without error but is so slow. Using split = -1 the simulation is not started, as follow:

LAMMPS (22 Aug 2018)
Reading data file …
orthogonal box = (0 0 -4) to (198.48 198.48 196)
2 by 2 by 2 MPI processor grid
reading atoms …
108864 atoms
reading velocities …
108864 velocities
scanning bonds …
4 = max bonds/atom
scanning angles …
6 = max angles/atom
scanning dihedrals …
16 = max dihedrals/atom
reading bonds …
88704 bonds
reading angles …
110016 angles
reading dihedrals …
134784 dihedrals
Finding 1-2 1-3 1-4 neighbors …
special bond factors lj: 0 0 0
special bond factors coul: 0 0 0
4 = max # of 1-2 neighbors
7 = max # of 1-3 neighbors
16 = max # of 1-4 neighbors
19 = max # of special neighbors
62208 atoms in group w
Finding SHAKE clusters …
0 = # of size 2 clusters
20736 = # of size 3 clusters
0 = # of size 4 clusters
0 = # of frozen angles
1783 atoms in group low
1635 atoms in group mew
1565 atoms in group upe
dynamic group l defined
dynamic group m defined
dynamic group u defined
PPPM initialization …
using 12-bit tables for long-range coulomb (…/kspace.cpp:321)
G vector (1/distance) = 0.19312
grid = 60 60 144
stencil order = 5
estimated absolute RMS force accuracy = 0.0351914
estimated relative force accuracy = 0.000105978
using double precision FFTs
3d grid and FFT values/proc = 149450 108000

image.png

Thank you Axel,

fix balance is available in LAMMPS since 2012.

I have used “fix balance” for calculation with CPUs, however it is not available for accelerating pppm/ gpu.

I used “package gpu 1 split -1.0” in order to load-balance across the CPU and GPU. When I used split = 1 the simulation does not work with ppp/gpu (as it was posted before here), I change it for ewald/gpu, the simulation run without error but is so slow. Using split = -1 the simulation is not started, as follow:

you are confusing two things here. fix balance, which adjusts the size of the subdomains for improved load balancing and CPU/GPU load balancing.
for your setup, where you are heavily oversubscribing your GPU, i would not use pppm/gpu at all, but rather only try to accelerate the pair styles and then run pppm on the CPU only. pppm/gpu only puts parts of the PPPM calculation on the GPU and in your setup you can run it concurrently with the Pair style on the GPU. for optimal speed you can then tweak the pair style cutoff, since pair interactions can be run much more efficiently on the GPU and kspace.

there is no GPU accelerated ewald style in the GPU package, and it has bad scaling, thus no surprise that it is slow.
you may want to consider simply using a somewhat longer coulomb cutoff instead of using a lattice sum.

do you have any numbers by how much load balancing helps to improve performance in the CPU only case?

axel.

image.png