Turn off force evaluation for "frozen" atoms in minimization

LorenzoPiersante · March 18, 2025, 5:08pm

Hello everyone,
I am using LAMMPS 19 Nov 2024 with ML-PACE and KOKKOS.
I have a very big cell (279’936 atoms) and I am trying to run a minimization over a small set of atoms. In practice, I am selecting a spherical region, setting the forces of the atoms outside the region to 0, and then calling the minimize command.

Here is the lammps code I am using:

region active_region sphere x_coord y_coord z_coord relaxation_radius
group active_group region active_region
group matrix_atoms subtract all active_group
fix freeze_matrix matrix_atoms setforce 0.0 0.0 0.0
neigh_modify exclude group matrix_atoms matrix_atoms
minimize 0.0 1e-2

I am using the neigh_modify command to avoid the calculation of forces on the atoms outside the region (at least this is what I believe I am doing).

My problem is that I barely see any difference in run times between this configuration and a minimization carried out over the whole system.

I attach run times for each of the two cases. They are for a small test case of 1440 atoms.

ALL ATOMS RELAXATION:

minimize 0.0 1e-1 1000000 1000000000
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 9.8
  ghost atom cutoff = 9.8
  binsize = 9.8, bins = 4 4 4
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair pace/kk, perpetual
      attributes: full, newton on, kokkos_device
      pair build: full/bin/kk/device
      stencil: full/bin/3d
      bin: kk/device
WARNING: Fix MINIMIZE/kk not compatible with sending data in Kokkos communication (src/KOKKOS/comm_kokkos.cpp:743)
WARNING: Fix with atom-based arrays not compatible with sending data in Kokkos communication, switching to classic exchange/border communication (src/KOKKOS/comm_kokkos.cpp:756)
WARNING: Fix MINIMIZE/kk not compatible with Kokkos sorting on device (src/KOKKOS/atom_kokkos.cpp:212)
WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (src/KOKKOS/atom_kokkos.cpp:218)
Per MPI rank memory allocation (min/avg/max) = 849.9 | 849.9 | 849.9 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   0             -2821.2965      0             -2821.2965      138.72526    
         7   0             -2821.7926      0             -2821.7926      112.55489    
Loop time of 0.319085 on 1 procs for 7 steps with 1440 atoms

100.1% CPU use with 1 MPI tasks x 1 OpenMP threads

Minimization stats:
  Stopping criterion = force tolerance
  Energy initial, next-to-last, final = 
     -2821.29649785665  -2821.79005654593  -2821.79257370597
  Force two-norm initial, final = 1.6205834 0.074825584
  Force max component initial, final = 0.88115703 0.020761894
  Final line search alpha, max atom move = 1 0.020761894
  Iterations, force evaluations = 7 11

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.31426    | 0.31426    | 0.31426    |   0.0 | 98.49
Neigh   | 0          | 0          | 0          |   0.0 |  0.00
Comm    | 0.00076918 | 0.00076918 | 0.00076918 |   0.0 |  0.24
Output  | 0          | 0          | 0          |   0.0 |  0.00
Modify  | 0          | 0          | 0          |   0.0 |  0.00
Other   |            | 0.004056   |            |       |  1.27

Nlocal:           1440 ave        1440 max        1440 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:           4600 ave        4600 max        4600 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:       232560 ave      232560 max      232560 min
Histogram: 1 0 0 0 0 0 0 0 0 0

RELAXATION OF A SUBREGION:

region active_region sphere 14.38009484925466 7.267417720097834 8.951932942652538 10.0
group active_group region active_region
151 atoms in group active_group
group matrix_atoms subtract all active_group
1289 atoms in group matrix_atoms
fix freeze_matrix matrix_atoms setforce 0.0 0.0 0.0
neigh_modify exclude group matrix_atoms matrix_atoms
minimize 0.0 1e-2 1000000 1000000000
WARNING: Fix MINIMIZE/kk not compatible with Kokkos sorting on device (src/KOKKOS/atom_kokkos.cpp:212)
WARNING: Fix with atom-based arrays not compatible with Kokkos sorting on device, switching to classic host sorting (src/KOKKOS/atom_kokkos.cpp:218)
Per MPI rank memory allocation (min/avg/max) = 849.9 | 849.9 | 849.9 Mbytes
   Step         Press          PotEng         TotEng    
         7  -1423.8964     -396.95962     -396.95962    
        21  -1378.622      -400.16562     -400.16562    
Loop time of 0.518071 on 1 procs for 14 steps with 1440 atoms

100.0% CPU use with 1 MPI tasks x 1 OpenMP threads

Minimization stats:
  Stopping criterion = force tolerance
  Energy initial, next-to-last, final = 
     -396.959624263704  -400.165592268308  -400.165619106138
  Force two-norm initial, final = 4.3817522 0.0079572012
  Force max component initial, final = 0.98483545 0.0018882199
  Final line search alpha, max atom move = 1 0.0018882199
  Iterations, force evaluations = 14 24

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.5151     | 0.5151     | 0.5151     |   0.0 | 99.43
Neigh   | 0          | 0          | 0          |   0.0 |  0.00
Comm    | 0.00038764 | 0.00038764 | 0.00038764 |   0.0 |  0.07
Output  | 0          | 0          | 0          |   0.0 |  0.00
Modify  | 0.00074282 | 0.00074282 | 0.00074282 |   0.0 |  0.14
Other   |            | 0.001839   |            |       |  0.36

Nlocal:           1440 ave        1440 max        1440 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:           4600 ave        4600 max        4600 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:        37662 ave       37662 max       37662 min
Histogram: 1 0 0 0 0 0 0 0 0 0

I am using a PACE potential in my LAMMPS runs. Thus, the pair_style is pace product.

Any insight on how to speed up this type of simulation is welcome.

Thanks is advance,
Lorenzo

tomasfbouvier · March 19, 2025, 7:53am

This is because your processors did not acknoweledge the operation. You need to rebalance the processors loads using balance or fix balance. Here is an example:

neigh_modify exclude group inmobile inmobile
neigh_modify exclude group ghost ghost
neigh_modify exclude group inmobile ghost

comm_style tiled
balance 0.9 rcb weight group 3 mobile 1.0 inmobile 0.0000000000001 ghost 0.0000000000001

The weight keyword is used to tell lammps to only pay attention to the mobile particles and ignore the rest, when building the processor domains. A good idea would be to visualize your domains using e.g
write_dump mobile custom dump.proc x y z proc

LorenzoPiersante · March 20, 2025, 9:55am

Hello, thanks a lot for your reply! Just a quick clarification. The mobile, inmobile and ghost variables you are using, are they lammps keywords? Or, do they correspond to something like this:

mobile = group of atoms I define as moving;
inmobile = group of atoms I obtain by subtracting mobile from all;
ghost = ?;

Yes, I don’t quite get what the ghost group would be.

Thanks,
Lorenzo

tomasfbouvier · March 20, 2025, 1:47pm

No, they are not keywords. They are groups that I defined before. I want
the ghost and inmobile groups to be frozen and exclude them from pair calculations. I use neigh modify exclude for that, as you suggested. In addition I want the balancing to just take into account the relevant group “aka mobile”. Otherwise your processor grid is a regular binning which is inneficient when most of your atoms are not included in the neighbor lists.

LorenzoPiersante · April 30, 2025, 2:38pm

Thanks a lot! Sorry for the super slow reply. I kept this a bit aside as of late.