Ok thanks! I just tried again with the latest develop branch version, and adding my potential to the EXTRA-PAIR package. I just revisited 4.8.1 in the documentation and I moved
src/pair_yukawa_expand.cpp(h) to src/EXTRA-PAIR/.
I was wondering what I should do with my other 5 files for the gpu version of my potential because it seems like only adding those 2 files won’t do anything for gpu and when I looked at the other files in the EXTRA-PAIR directory there were no gpu files so it didn’t seem right to put them there.
As for Cmake. I tried before with cmake with my other versions of lammps doing something like this
mkdir build; cd build
cmake ../cmake
cmake -D PKG_GPU=yes .
cmake --build .
but then when I ran a nve script with gpu it kept failing, however now that I’m using the stable release as of last night that has stopped happening so I’m happy to go back to the cmake way.
Anyways, last night I compiled it the cmake way but threw in
cmake -D PKG_EXTRA-PAIR=yes .
in my cmake command. (on a version of lammps with no changes except adding my 2 files to EXTRA-PAIR)
Then I tried runnning a script with my new potential with cpu
pathtolammps/build/lmp -in ykw-expand.inp
and it worked perfectly. However I was still not sure how to run it with gpu.
I naively tried just adding -sf gpu to this same line doing:
pathtolammps/build/lmp -sf gpu -in ykw-expand.inp
and that actually did change the log script to saying something about citing GPU but the total times were about the same for the runs.
Below is the log file for a very small run of the script for the 2 inputs and the input script i’m using. I assume there’s something I’m missing that I have to do for gpu for my extra pair script although theres a chance there’s something about my local cluster I’m not doing right to run an optimized gpu script. My main remaining question is there anything else I’m supposed to do to get my new extra pair script to run with gpu besides just adding those 2 files to the EXTRA-PAIR directory and compiling it with the extra pair and gpu package (assuming I’m compiling it with cmake and don’t need to worry about the installation things you mentioned at the end of your response). Additionally is there any difference in way I’m supposed to run my script for it to run gpu besides -sf gpu. If those are the only changes I’m supposed to make I will move on to try to debug things on my local cluster but don’t want to do that before I know I have everything good on the lammps end of things.
Below is first the log file for including -sf gpu and then without it, and the last thing is the actual input script I’m using. A lot of the data section has been cut out to keep it brief but they all match
/zfshomes/saronow/lammp2/build/lmp -sf gpu -in ykw-expand.inp
LAMMPS (27 Jun 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Reading data file ...
orthogonal box = (0 0 -0.5) to (132.5 134.23393 0.5)
1 by 1 by 1 MPI processor grid
reading atoms ...
3286 atoms
read_data CPU = 0.029 seconds
WARNING: Calling write_dump before a full system init. (src/write_dump.cpp:70)
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
- GPU package (short-range, long-range and three-body potentials): doi:10.1016/j.cpc.2010.12.021, doi:10.1016/j.cpc.2011.10.012, doi:10.1016/j.cpc.2013.08.002, doi:10.1016/j.commatsci.2014.10.068, doi:10.1016/j.cpc.2016.10.020, doi:10.3233/APC200086
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
update: every = 1 steps, delay = 0 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 20.3
ghost atom cutoff = 20.3
binsize = 10.15, bins = 14 14 1
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair yukawa/expand, perpetual
attributes: half, newton on
pair build: half/bin/atomonly/newton
stencil: half/bin/2d
bin: standard
Setting up Verlet run ...
Unit style : lj
Current step : 0
Time step : 0.005
Per MPI rank memory allocation (min/avg/max) = 4.757 | 4.757 | 4.757 Mbytes
Step KinEng PotEng TotEng Press Temp Density
0 0.99969568 9.7583551 10.758051 6.2009127 1 0.18475209
100 0.46644897 10.29179 10.758239 6.3377745 0.46659097 0.18475209
MORE DATA CUT OUT
9900 0.5048716 10.253363 10.758234 6.3268987 0.50502529 0.18475209
10000 0.4961012 10.262137 10.758238 6.3293614 0.49625222 0.18475209
Loop time of 171.693 on 1 procs for 10000 steps with 3286 atoms
Performance: 25161.156 tau/day, 58.243 timesteps/s, 191.388 katom-step/s
99.7% CPU use with 1 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 166.47 | 166.47 | 166.47 | 0.0 | 96.96
Neigh | 4.394 | 4.394 | 4.394 | 0.0 | 2.56
Comm | 0.24017 | 0.24017 | 0.24017 | 0.0 | 0.14
Output | 0.011451 | 0.011451 | 0.011451 | 0.0 | 0.01
Modify | 0.31392 | 0.31392 | 0.31392 | 0.0 | 0.18
Other | | 0.2606 | | | 0.15
Nlocal: 3286 ave 3286 max 3286 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 2312 ave 2312 max 2312 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 397520 ave 397520 max 397520 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Total # of neighbors = 397520
Ave neighs/atom = 120.97383
Neighbor list builds = 923
Dangerous builds = 0
System init for write_data ...
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
System init for write_restart ...
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Total wall time: 0:02:53
SECOND RUN WITHOUT GPU
/zfshomes/saronow/lammp2/build/lmp -in ykw-expand.inp
LAMMPS (27 Jun 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Reading data file ...
orthogonal box = (0 0 -0.5) to (132.5 134.23393 0.5)
1 by 1 by 1 MPI processor grid
reading atoms ...
3286 atoms
read_data CPU = 0.027 seconds
WARNING: Calling write_dump before a full system init. (src/write_dump.cpp:70)
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
update: every = 1 steps, delay = 0 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 20.3
ghost atom cutoff = 20.3
binsize = 10.15, bins = 14 14 1
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair yukawa/expand, perpetual
attributes: half, newton on
pair build: half/bin/atomonly/newton
stencil: half/bin/2d
bin: standard
Setting up Verlet run ...
Unit style : lj
Current step : 0
Time step : 0.005
Per MPI rank memory allocation (min/avg/max) = 4.682 | 4.682 | 4.682 Mbytes
Step KinEng PotEng TotEng Press Temp Density
0 0.99969568 9.7583551 10.758051 6.2009127 1 0.18475209
100 0.46644897 10.29179 10.758239 6.3377745 0.46659097 0.18475209
200 0.44513208 10.313132 10.758264 6.3423627 0.44526758 0.18475209
MORE DATA CUT OUT
9900 0.5048716 10.253363 10.758234 6.3268987 0.50502529 0.18475209
10000 0.4961012 10.262137 10.758238 6.3293614 0.49625222 0.18475209
Loop time of 172.174 on 1 procs for 10000 steps with 3286 atoms
Performance: 25090.907 tau/day, 58.081 timesteps/s, 190.854 katom-step/s
99.7% CPU use with 1 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 166.92 | 166.92 | 166.92 | 0.0 | 96.95
Neigh | 4.362 | 4.362 | 4.362 | 0.0 | 2.53
Comm | 0.24326 | 0.24326 | 0.24326 | 0.0 | 0.14
Output | 0.012014 | 0.012014 | 0.012014 | 0.0 | 0.01
Modify | 0.40272 | 0.40272 | 0.40272 | 0.0 | 0.23
Other | | 0.2313 | | | 0.13
Nlocal: 3286 ave 3286 max 3286 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 2312 ave 2312 max 2312 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 397520 ave 397520 max 397520 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Total # of neighbors = 397520
Ave neighs/atom = 120.97383
Neighbor list builds = 923
Dangerous builds = 0
System init for write_data ...
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
System init for write_restart ...
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Total wall time: 0:02:52
INPUT SCRIPT
# Initialization
variable T world 1.00
variable Tstart world 1.00
variable Tfinish world 1.00
variable seed world 341341
log colloid.log
units lj
dimension 2
atom_style atomic
boundary p p p
pair_style yukawa/expand 2.25 20
pair_modify shift yes
# System definition
read_data crystal.lmp
# Simulation parameters
mass 1 1
velocity all create ${Tstart} ${seed} mom yes dist gaussian
# Simulation settings
dump 1 all xyz 10000 coords.xyz
write_dump all xyz initial.xyz
timestep 0.005
pair_coeff 1 1 235 1.0
# Run
thermo 100
thermo_style custom step ke pe etotal press temp density
thermo_modify flush yes
fix momfix all momentum 100 linear 1 1 1
fix 1 all nve
run 10000
write_data final.dat
write_restart final.res