Compiling a new pair potential with gpu

I’m trying to create a new analytic potential that represents a Yukawa potential with a shift in the screening by the diameter of a particle. I downloaded the most recent stable release from github by using:

git clone -b release https://github.com/lammps/lammps.git stable-lammps.
In this lammps i then coped 7 files to make a expanded (shifted) yukawa potential

cp src/pair_yukawa.cpp src/pair_yukawa_expand.cpp
cp src/pair_yukawa.h src/pair_yukawa_expand.h
cp src/GPU/pair_yukawa_gpu.cpp src/GPU/pair_yukawa_expand_gpu.cpp
cp src/GPU/pair_yukawa_gpu.h src/GPU/pair_yukawa_expand_gpu.h
cp lib/gpu/lal_yukawa.cpp lib/gpu/lal_yukawa_expand.cpp
cp lib/gpu/lal_yukawa.cu lib/gpu/lal_yukawa_expand.cu
cp lib/gpu/lal_yukawa.h lib/gpu/lal_yukawa_expand.h

I then changed the new files to represent my new potential and then looked at the differences between pair_lj and pair_lj_expand and modeled those to change all the header information and such for my new yukawa_expand files.

When I then compiled it on a node of my local cluster (which is compatible with gpu) I could use it perfectly with cpu and when i compiled it with gpu (how i did that is at the end of this post) it compiled fine but when i actually went to run yukawa/expand/gpu by running:

mpirun -np 8 pathtolammps/lmp_mpi -sf gpu -in colloid.inp

I get this error:

ERROR: Unrecognized pair style 'yukawa/expand/gpu' is part of the GPU package, but seems to be missing because of a dependency (../force.cpp:275)

I’m 99% sure that that error is happening when the code is inside src/GPU so I assume ../force.cpp refers to src/force.cpp. Which has line 275:

error->all(FLERR, utils::check_packages_for_style("pair", style, lmp));

I thus think in my compilation I need some type of package for my potential to run on gpu.

I tried to look through the documentation for how to add a package for this type of thing and found 4.8.1. Writing new pair styles, which helped make sure the potential code worked but I wasn’t sure about compiling it with gpu. I also checked out the package information in section 3 of the documentation but wasn’t sure where to go from there.

Essentially I’ve been able to write the code to make the potential work for cpu (and i think the code looks sturdy for gpu as well) but I’m having trouble getting the compilation right for running gpu with my new potential. (it all compiles fine, it’s just the runtime errors of things not being linked together right).

Do you have any ideas what might be the next step for fixing that error and getting my potential to work?
Thanks in advance!

The following is a txt file of the instructions i’ve been following:

#ssh into the node with gpu capabilities
ssh n79

#run these exports (i think this is more for my local cluster)
export PATH=/share/apps/CENTOS7/gcc/6.5.0/bin/:$PATH
 export LD_LIBRARY_PATH=/share/apps/CENTOS7/gcc/6.5.0/lib64:$LD_LIBRARY_PATH
 export PATH=/share/apps/CENTOS7/python/3.8.3/bin:$PATH
 export LD_LIBRARY_PATH=/share/apps/CENTOS7/python/3.8.3/lib:$LD_LIBRARY_PATH
 export PATH=/share/apps/CENTOS7/openmpi/4.0.4/bin:$PATH
 export LD_LIBRARY_PATH=/share/apps/CENTOS7/openmpi/4.0.4/lib:$LD_LIBRARY_PATH
 which mpirun mpicc python gcc

cd ~/lammps/src/

#include some packages don't know if this is doing anything
 make yes-colloid
 make yes-misc

cd STUBS
make
cd ..

vi MAKE/Makefile.serial
# in this file replace the empty initialization of these variables with this

LMP_INC =   	-DLAMMPS_GZIP -DLAMMPS_MEMALIGN=64  -DLAMMPS_JPEG  # -DLAMMPS_CXX98
JPG_INC =   -I/usr/include
JPG_PATH =  -L/usr/lib64
JPG_LIB =   -ljpeg

#exit the file
make serial

vi MAKE/Makefile.mpi
#do the same changes as before with those 4 lines

make mpi
make yes-gpu

#now run the following exports
export CUDA_HOME=/usr/local/cuda
 export PATH=/usr/local/cuda/bin:$PATH
 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
 export LD_LIBRARY_PATH=/usr/local/cuda/lib:$LD_LIBRARY_PATH
 which nvcc

#go back up
cd ..

#save the original version of lib/gpu
scp -rp lib/gpu lib/gpu-orig
cd lib/gpu

vi Makefile.linux
#change this to that
CUDA_HOME = /usr/local/cuda > CUDA_HOME = -sm_75

#change this to that
CUDA_ARCH = -arch=sm_60 > CUDA_ARCH = -arch=sm_75

#save the file and run this
make -f Makefile.linux
now cd ../../src

#then again run:
make mpi


#then scp the lmp_mpi and lmp_serial into your directory (be in src when you do this)
scp lmp_mpi /zfshomes/saronow/new-lammps/test-lammps/lmp_mpi
scp lmp_serial /zfshomes/saronow/new-lammps/test-lammps/lmp_serial

# THEN FINALLY to run things:
#make sure you still have the same exports and run
mpirun -np 8 pathtolammps/lmp_mpi -in your_file.inp
#or for gpu
mpirun -np 8 pathtolammps/lmp_mpi -sf gpu -in colloid.inp

Two comments on this.

  1. the release branch is not the “stable” version
  2. for development we generally recommend to use the develop branch and keep it up-to-date

You should not add new pair styles to “src” but rather to the EXTRA-PAIR package.

Since you are using the traditional GNU make build style (have you tried CMake?) you have to update all installed packages when you make changes, so the files in the package get copied to the “src” folder (CMake doesn’t need that). For this to work with the GPU package, also the file “src/GPU/Install.sh” needs to be updated to know about your addition.

Ok thanks! I just tried again with the latest develop branch version, and adding my potential to the EXTRA-PAIR package. I just revisited 4.8.1 in the documentation and I moved

src/pair_yukawa_expand.cpp(h) to src/EXTRA-PAIR/.

I was wondering what I should do with my other 5 files for the gpu version of my potential because it seems like only adding those 2 files won’t do anything for gpu and when I looked at the other files in the EXTRA-PAIR directory there were no gpu files so it didn’t seem right to put them there.

As for Cmake. I tried before with cmake with my other versions of lammps doing something like this

mkdir build; cd build
cmake ../cmake

cmake -D PKG_GPU=yes .

cmake --build .

but then when I ran a nve script with gpu it kept failing, however now that I’m using the stable release as of last night that has stopped happening so I’m happy to go back to the cmake way.

Anyways, last night I compiled it the cmake way but threw in
cmake -D PKG_EXTRA-PAIR=yes .
in my cmake command. (on a version of lammps with no changes except adding my 2 files to EXTRA-PAIR)

Then I tried runnning a script with my new potential with cpu
pathtolammps/build/lmp -in ykw-expand.inp

and it worked perfectly. However I was still not sure how to run it with gpu.
I naively tried just adding -sf gpu to this same line doing:
pathtolammps/build/lmp -sf gpu -in ykw-expand.inp
and that actually did change the log script to saying something about citing GPU but the total times were about the same for the runs.

Below is the log file for a very small run of the script for the 2 inputs and the input script i’m using. I assume there’s something I’m missing that I have to do for gpu for my extra pair script although theres a chance there’s something about my local cluster I’m not doing right to run an optimized gpu script. My main remaining question is there anything else I’m supposed to do to get my new extra pair script to run with gpu besides just adding those 2 files to the EXTRA-PAIR directory and compiling it with the extra pair and gpu package (assuming I’m compiling it with cmake and don’t need to worry about the installation things you mentioned at the end of your response). Additionally is there any difference in way I’m supposed to run my script for it to run gpu besides -sf gpu. If those are the only changes I’m supposed to make I will move on to try to debug things on my local cluster but don’t want to do that before I know I have everything good on the lammps end of things.

Below is first the log file for including -sf gpu and then without it, and the last thing is the actual input script I’m using. A lot of the data section has been cut out to keep it brief but they all match

/zfshomes/saronow/lammp2/build/lmp -sf gpu -in ykw-expand.inp
LAMMPS (27 Jun 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
  orthogonal box = (0 0 -0.5) to (132.5 134.23393 0.5)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  3286 atoms
  read_data CPU = 0.029 seconds
WARNING: Calling write_dump before a full system init. (src/write_dump.cpp:70)

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- GPU package (short-range, long-range and three-body potentials): doi:10.1016/j.cpc.2010.12.021, doi:10.1016/j.cpc.2011.10.012, doi:10.1016/j.cpc.2013.08.002, doi:10.1016/j.commatsci.2014.10.068, doi:10.1016/j.cpc.2016.10.020, doi:10.3233/APC200086
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 20.3
  ghost atom cutoff = 20.3
  binsize = 10.15, bins = 14 14 1
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair yukawa/expand, perpetual
      attributes: half, newton on
      pair build: half/bin/atomonly/newton
      stencil: half/bin/2d
      bin: standard
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 4.757 | 4.757 | 4.757 Mbytes
   Step         KinEng         PotEng         TotEng         Press           Temp         Density
         0   0.99969568     9.7583551      10.758051      6.2009127      1              0.18475209
       100   0.46644897     10.29179       10.758239      6.3377745      0.46659097     0.18475209
MORE DATA CUT OUT
 9900   0.5048716      10.253363      10.758234      6.3268987      0.50502529     0.18475209
     10000   0.4961012      10.262137      10.758238      6.3293614      0.49625222     0.18475209
Loop time of 171.693 on 1 procs for 10000 steps with 3286 atoms

Performance: 25161.156 tau/day, 58.243 timesteps/s, 191.388 katom-step/s
99.7% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 166.47     | 166.47     | 166.47     |   0.0 | 96.96
Neigh   | 4.394      | 4.394      | 4.394      |   0.0 |  2.56
Comm    | 0.24017    | 0.24017    | 0.24017    |   0.0 |  0.14
Output  | 0.011451   | 0.011451   | 0.011451   |   0.0 |  0.01
Modify  | 0.31392    | 0.31392    | 0.31392    |   0.0 |  0.18
Other   |            | 0.2606     |            |       |  0.15

Nlocal:           3286 ave        3286 max        3286 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:           2312 ave        2312 max        2312 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:         397520 ave      397520 max      397520 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 397520
Ave neighs/atom = 120.97383
Neighbor list builds = 923
Dangerous builds = 0
System init for write_data ...
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
System init for write_restart ...
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Total wall time: 0:02:53

SECOND RUN WITHOUT GPU

/zfshomes/saronow/lammp2/build/lmp -in ykw-expand.inp
LAMMPS (27 Jun 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
  orthogonal box = (0 0 -0.5) to (132.5 134.23393 0.5)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  3286 atoms
  read_data CPU = 0.027 seconds
WARNING: Calling write_dump before a full system init. (src/write_dump.cpp:70)

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 20.3
  ghost atom cutoff = 20.3
  binsize = 10.15, bins = 14 14 1
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair yukawa/expand, perpetual
      attributes: half, newton on
      pair build: half/bin/atomonly/newton
      stencil: half/bin/2d
      bin: standard
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 4.682 | 4.682 | 4.682 Mbytes
   Step         KinEng         PotEng         TotEng         Press           Temp         Density
         0   0.99969568     9.7583551      10.758051      6.2009127      1              0.18475209
       100   0.46644897     10.29179       10.758239      6.3377745      0.46659097     0.18475209
       200   0.44513208     10.313132      10.758264      6.3423627      0.44526758     0.18475209
 MORE DATA CUT OUT
      9900   0.5048716      10.253363      10.758234      6.3268987      0.50502529     0.18475209
     10000   0.4961012      10.262137      10.758238      6.3293614      0.49625222     0.18475209
Loop time of 172.174 on 1 procs for 10000 steps with 3286 atoms

Performance: 25090.907 tau/day, 58.081 timesteps/s, 190.854 katom-step/s
99.7% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 166.92     | 166.92     | 166.92     |   0.0 | 96.95
Neigh   | 4.362      | 4.362      | 4.362      |   0.0 |  2.53
Comm    | 0.24326    | 0.24326    | 0.24326    |   0.0 |  0.14
Output  | 0.012014   | 0.012014   | 0.012014   |   0.0 |  0.01
Modify  | 0.40272    | 0.40272    | 0.40272    |   0.0 |  0.23
Other   |            | 0.2313     |            |       |  0.13

Nlocal:           3286 ave        3286 max        3286 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:           2312 ave        2312 max        2312 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:         397520 ave      397520 max      397520 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 397520
Ave neighs/atom = 120.97383
Neighbor list builds = 923
Dangerous builds = 0
System init for write_data ...
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
System init for write_restart ...
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Total wall time: 0:02:52

INPUT SCRIPT

# Initialization
variable  T           world   1.00
variable  Tstart           world   1.00
variable  Tfinish           world   1.00
variable  seed        world   341341

log               colloid.log

units             lj
dimension         2
atom_style        atomic
boundary          p  p  p
pair_style        yukawa/expand  2.25 20
pair_modify       shift yes

# System definition

read_data         crystal.lmp

# Simulation parameters
mass            1    1
velocity        all create ${Tstart} ${seed} mom yes dist gaussian

# Simulation settings
dump              1 all xyz 10000 coords.xyz
write_dump        all xyz initial.xyz
timestep          0.005
pair_coeff        1    1    235   1.0

# Run
thermo            100
thermo_style      custom step ke pe etotal press temp density
thermo_modify     flush yes
fix momfix all momentum 100 linear 1 1 1



fix               1    all  nve
run               10000

write_data        final.dat
write_restart     final.res

You had already put them into the right place.

The simple way to check is to run:

pathtolammps/build/lmp -h

This should show that both the GPU package is installed and that your yukawa/expand and yukawa/expand/gpu pair styles are included. If not, there is something not quite right with your compilation, but it is almost impossible to say what from remote and without the ability to look over your shoulder.

This step has nothing to do with the GPU package. It is just good practice to add new custom pair styles to the EXTRA-PAIR package.

In most cases not.

For building LAMMPS with packages there is information in the “Build LAMMPS” section of the manual and some more details in the “Optional packages” section. For running with accelerators, there are lots of details in the “Accelerator packages” section.

The GPU package will print information to the screen if a GPU style is used and name exactly which style.

P.S.: please note that to see significant acceleration, you need to have enough atoms in your system (some 10s of thousands) and a sufficiently powerful GPU.

It works!! I put the other codes back in place with the main codes in EXTRA-PAIR, and also added a modified version for an lal_yukawa_expand_ext file that I was missing. The script I’m actually using is a much bigger system which does benefit from gpu I just used a shortened version to test for this and my gpu code definitely works now! The main issue was compiling it manually without modifying the GPU/Install.sh file but fixing that got the gpu to work with both manual and cmake compilation. Thank you so much for all your help I had been looking for that one little fix for many hours!
Thanks again!

Glad to hear you could sort it out.

When you are done testing your code, please consider contributing it to the LAMMPS distribution by submitting a pull request. That will have the benefit that an expert in the GPU package will get to look it over. :wink: