Hello everyone,
I am using LAMMPS 19 Nov 2024 with ML-PACE and KOKKOS.
I have a very big cell (279’936 atoms) and I am trying to run a minimization over a small set of atoms. In practice, I am selecting a spherical region, setting the forces of the atoms outside the region to 0, and then calling the minimize command.
Here is the lammps code I am using:
region active_region sphere x_coord y_coord z_coord relaxation_radius
group active_group region active_region
group matrix_atoms subtract all active_group
fix freeze_matrix matrix_atoms setforce 0.0 0.0 0.0
neigh_modify exclude group matrix_atoms matrix_atoms
minimize 0.0 1e-2
I am using the neigh_modify
command to avoid the calculation of forces on the atoms outside the region (at least this is what I believe I am doing).
My problem is that I barely see any difference in run times between this configuration and a minimization carried out over the whole system.
I attach run times for each of the two cases. They are for a small test case of 1440 atoms.
ALL ATOMS RELAXATION:
minimize 0.0 1e-1 1000000 1000000000
Neighbor list info ...
update: every = 1 steps, delay = 0 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 9.8
ghost atom cutoff = 9.8
binsize = 9.8, bins = 4 4 4
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair pace/kk, perpetual
attributes: full, newton on, kokkos_device
pair build: full/bin/kk/device
stencil: full/bin/3d
bin: kk/device
WARNING: Fix MINIMIZE/kk not compatible with sending data in Kokkos communication (src/KOKKOS/comm_kokkos.cpp:743)
WARNING: Fix with atom-based arrays not compatible with sending data in Kokkos communication, switching to classic exchange/border communication (src/KOKKOS/comm_kokkos.cpp:756)
WARNING: Fix MINIMIZE/kk not compatible with Kokkos sorting on device (src/KOKKOS/atom_kokkos.cpp:212)
WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (src/KOKKOS/atom_kokkos.cpp:218)
Per MPI rank memory allocation (min/avg/max) = 849.9 | 849.9 | 849.9 Mbytes
Step Temp E_pair E_mol TotEng Press
0 0 -2821.2965 0 -2821.2965 138.72526
7 0 -2821.7926 0 -2821.7926 112.55489
Loop time of 0.319085 on 1 procs for 7 steps with 1440 atoms
100.1% CPU use with 1 MPI tasks x 1 OpenMP threads
Minimization stats:
Stopping criterion = force tolerance
Energy initial, next-to-last, final =
-2821.29649785665 -2821.79005654593 -2821.79257370597
Force two-norm initial, final = 1.6205834 0.074825584
Force max component initial, final = 0.88115703 0.020761894
Final line search alpha, max atom move = 1 0.020761894
Iterations, force evaluations = 7 11
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.31426 | 0.31426 | 0.31426 | 0.0 | 98.49
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 0.00076918 | 0.00076918 | 0.00076918 | 0.0 | 0.24
Output | 0 | 0 | 0 | 0.0 | 0.00
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 0.004056 | | | 1.27
Nlocal: 1440 ave 1440 max 1440 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 4600 ave 4600 max 4600 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs: 232560 ave 232560 max 232560 min
Histogram: 1 0 0 0 0 0 0 0 0 0
RELAXATION OF A SUBREGION:
region active_region sphere 14.38009484925466 7.267417720097834 8.951932942652538 10.0
group active_group region active_region
151 atoms in group active_group
group matrix_atoms subtract all active_group
1289 atoms in group matrix_atoms
fix freeze_matrix matrix_atoms setforce 0.0 0.0 0.0
neigh_modify exclude group matrix_atoms matrix_atoms
minimize 0.0 1e-2 1000000 1000000000
WARNING: Fix MINIMIZE/kk not compatible with Kokkos sorting on device (src/KOKKOS/atom_kokkos.cpp:212)
WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (src/KOKKOS/atom_kokkos.cpp:218)
Per MPI rank memory allocation (min/avg/max) = 849.9 | 849.9 | 849.9 Mbytes
Step Press PotEng TotEng
7 -1423.8964 -396.95962 -396.95962
21 -1378.622 -400.16562 -400.16562
Loop time of 0.518071 on 1 procs for 14 steps with 1440 atoms
100.0% CPU use with 1 MPI tasks x 1 OpenMP threads
Minimization stats:
Stopping criterion = force tolerance
Energy initial, next-to-last, final =
-396.959624263704 -400.165592268308 -400.165619106138
Force two-norm initial, final = 4.3817522 0.0079572012
Force max component initial, final = 0.98483545 0.0018882199
Final line search alpha, max atom move = 1 0.0018882199
Iterations, force evaluations = 14 24
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.5151 | 0.5151 | 0.5151 | 0.0 | 99.43
Neigh | 0 | 0 | 0 | 0.0 | 0.00
Comm | 0.00038764 | 0.00038764 | 0.00038764 | 0.0 | 0.07
Output | 0 | 0 | 0 | 0.0 | 0.00
Modify | 0.00074282 | 0.00074282 | 0.00074282 | 0.0 | 0.14
Other | | 0.001839 | | | 0.36
Nlocal: 1440 ave 1440 max 1440 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 4600 ave 4600 max 4600 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs: 37662 ave 37662 max 37662 min
Histogram: 1 0 0 0 0 0 0 0 0 0
I am using a PACE potential in my LAMMPS runs. Thus, the pair_style
is pace product
.
Any insight on how to speed up this type of simulation is welcome.
Thanks is advance,
Lorenzo