Issues with combination of fix move, periodic boundary and KOKKOS

Hi people,

I’m doing a simulation of a sheet being dragged on top of a substrate, where I run into trouble when the sheet transitions through the periodic boundary. This is only a problem when running it with KOKKOS.

My problem can be reproduced by the following simplified code, where I drag a free floating graphene sheet by its end. That is, I apply a fix move on a dedicated part of the sheet (pull block: PB), while the remaining part is being modeled by the Tersoff potensial fitted for graphene. I use periodic boundary conditions in the plane of the sheet (x,y), and as soon as the non-fixed part of the sheet hits the boundary, it rips apart from the PB and scatters atoms from the connections. The reproduction code reads as follows:

units metal
newton on 
boundary p p m  
atom_style atomic 

# Graphene lattice
lattice custom 2.419 &
        a1    0                 1.0     0     &
        a2    $(sqrt(3)/2)      0.5     0     &
        a3    $(1/(2*sqrt(3)))  0.5     0.83  &
        basis 0                 0       0     &
        basis $(1/3)            $(1/3)  0.0

# Simulation box & atoms
region simreg block 5 15 5 15 0 0.5 
region boxreg block 0 20 0 20 -5 5 
create_box 1 boxreg
create_atoms 1 region simreg basis 1 1

# Pull part of the sheet
region PB_region block INF INF 65.0 INF INF 1 units box # Pull block
group PB region PB_region 
group integrate subtract all PB
fix move_PB PB move linear 0 2 0 units box 

# Dynamics            
mass 1 12.0107  

velocity integrate create 1.0 5432373 dist gaussian
pair_style tersoff
pair_coeff * * C.tersoff C 

timestep 0.001
fix nve integrate nve 

# Output
thermo 1000
dump 1 all custom 100 dump_reproduce.data id type x y z vx vy vz
run 25000

where the file C.tersoff reads

C C C 3.0 1.0 0.0 3.8049e4 4.3484 -0.57058 0.72751 1.5724e-7 2.2119 346.74 1.95 0.15 3.4879 1393.6

This runs perfectly fine on CPU but when running it with KOKKOS, using the following job script, it fails a described initially.

#!/bin/bash

#SBATCH --job-name=NG4_GPU
#
#SBATCH --partition=normal
#
#SBATCH --ntasks=1
#
#SBATCH --cpus-per-task=2
#
#SBATCH --gres=gpu:1
#
#SBATCH --output=slurm.out
#

mpirun -n 1 lmp -pk kokkos newton on neigh half -k on g 1 -sf kk -in reproduce.in

My main suspicion is that KOKKOS might not support this due to the warning

WARNING: Fixes cannot yet send exchange data in Kokkos communication, switching to classic exchange/border communication (src/KOKKOS/comm_kokkos.cpp:651)

In order to investigate the problem further I ran a simulation where I unfixed the ‘fix move’ before going through the periodic boundary, and also a simulation where I applied the fix move to the whole sheet (no interatomic potential). Both ran perfectly fine, and thus it is only in the combination of a fix move on the PB with the remaining part being modeled by the interatomic potential that I cannot pass the periodic boundary successfully.

I hope that you might be able to guide me in the right direction here.
Thanks in advance.

Best regards
Mikkel

This is being worked on: Kokkos exchange comm for fixes by valleymouth · Pull Request #1394 · lammps/lammps · GitHub

However, I am not certain whether this is the cause for your issues.
This is something for @stamoor to look at.

1 Like

Thanks for the fast reply @akohlmey. There might be a connection here. My main code (more advanced than the one shown here) also ran slower than expected due to the time spend on the ‘Modify’ category in MPI task timing breakdown. Hopefully @stamoor have some insight here.

The warning should only affect performance–it has to pack/unpack MPI buffers for exchange/border comms on the host CPU instead of GPU. Anything that affects the dynamics would be a bug in the Kokkos package. Can you try compiling with UVM enabled and see if it runs correctly: -D Kokkos_ENABLE_CUDA_UVM=ON

This did indeed produce the expected dynamics. Now it runs 10 times as slow, which is probably expected since it uses unified memory.

OK that proves it is a bug. I’m surprised it is 10x slower, what GPU are you using? I’ll try to fix it ASAP, but this week is really busy.

1 Like

Thanks for the help! I’m using NVIDIA Tesla P100-PCIE-16GB, 56 CUs, 16/16 GB, 1.3 GHZ, but I’ll have to double check that statement, since the original compilation also seem to be running slower than what I remembered. I’ll be back on that one.

I have been running some quick tests on the script mentioned here, and from that I at least find a speed difference of a factor 4 favoring the original KOKKOS compilation with Kokkos_ENABLE_CUDA_UVM=OFF.

1 Like