Dear LAMMPS users,
I was trying to achieve the best performance of LAMMPS with dpd
interaction using OMP and GPU packages. Below are my results and the
way I got them. I would be very thankful if you provide you
comments/suggestions regarding how optimal are my simulations and how
they can be improved further.
Machine: Cray XC30 system, 8-core 64-bit Intel SandyBridge CPU (Intel®
Xeon® E5-2670), NVIDIA Tesla K20X with 6 GB GDDR5 memory.
I ran dpd fluid for 50000 timesteps (see the input script in the end
of the letter).
The best performance with OMP package for one node was achieved for 8
MPI tasks and 2 OMP (hyperthreading). The time: 33m32s.
The best time for GPU package was 18m21. It looks suspicious that the
speed up for GPU was 1.8x.
The way we call aprun for OMP:
export OMP_NUM_THREADS=2
time aprun -n 8 -N 8 -d $OMP_NUM_THREADS -j 2 ./lmp-omp < in.water-cpu
Compiler optimizations:
For nvcc: -O3 -code=sm_35 -Xptxas --use_fast_math
For gcc: -O3 -mavx -mtune=native (-fopenmp)
OMP lammps script:
package omp 2
boundary p p p
units lj
atom_style atomic
lattice custom 3.0 a1 1.0 0.0 0.0 a2 0.0 1.0 0.0 a3 0.0 0.0 1.0 &
basis 0.5 0.0 0.0 basis 0.0 0.5 0.0 basis 0.0 0.0 0.5
region box block -24.0 24.0 -24.0 24.0 -24.0 24.0
create_box 1 box
create_atoms 1 random 442368 1234 box
mass 1 1.0
neighbor 0.3 bin
neigh_modify delay 0 every 4 check yes
comm_style brick
comm_modify vel yes
pair_style dpd/omp 0.0945 1.0 34387
pair_coeff 1 1 100.0 45.0 1.0
thermo 10000
timestep 0.001
fix 1 all nve/omp
run 50000
GPU lammps script:
package gpu 1 device kepler
boundary p p p
units lj
atom_style atomic
lattice custom 3.0 a1 1.0 0.0 0.0 a2 0.0 1.0 0.0 a3 0.0 0.0 1.0 &
basis 0.5 0.0 0.0 basis 0.0 0.5 0.0 basis 0.0 0.0 0.5
region box block -24.0 24.0 -24.0 24.0 -24.0 24.0
create_box 1 box
create_atoms 1 random 442368 1234 box
mass 1 1.0
neighbor 0.3 bin
neigh_modify delay 0 every 4 check yes
comm_style brick
comm_modify vel yes
pair_style dpd/gpu 0.0945 1.0 34387
pair_coeff 1 1 100.0 45.0 1.0
thermo 10000
timestep 0.001
fix 1 all nve
run 50000