Efficiency of multiple fix adapts

Hi,

My name is Aradhya Rajanala, a graduate student in the Goldman Lab of Georgia Tech.

I’m working on a system in which I have a cylinder made of bonded atoms in the shape of a plant root. My goal is to grow this system by using fix adapt to elongate the bonds as a function of length to emulate root growth. It is important that I am able to set the time at which the bonds grow as a function of their index – varying the speed and timing of growth of the bonds is crucial to my research question. At the moment, the simulation runs correctly but prohibitively slowly. With 2000 particles and 7000 bonds, I can run 2*10^8 timesteps in 20-40 minutes (which has already given valuable results), but the range of ~7000 particles and ~22000 bonds this simulation takes dozens of hours to run, tested on several different machines of varying performances.

Currently, I am using a for loop with fix adapt to set the rate of growth. I recognize that this is the major slowdown in the simulation and an inefficient method to set updating bonds (running thousands of fix commands is almost certainly problematic), but I was unable to find a better way to accomplish this.

In short, what I want to do is to be able to set an equation for the bond lengths as a function of the bond type without drastically reducing the simulation speed. For example, is there a way to run a single fix adapt that updates the bond lengths as a function of their index (instead of one fix per bond type)?

I’ve included an simplified version of what I believe is the lines that cause the major slow down below. Happy to provide more details as needed or show visualizations of the outputs for context.

Thank you so much for your help,
Aradhya Rajanala

variable a loop 7360 #number of adapting bonds in the system
label loop
variable size$a equal #some function of a and elaplong here
fix $a all adapt 5000 bond harmonic r0 $a v_size$a reset no
next a
jump SELF loop

What kind of hardware are you running on? And how do you run LAMMPS? Are you using any of the acceleration options? Can you post a complete input deck for a typical (test) run, so people can make some independent benchmarks?

Dozens of hours is “nothing”. There are people running much bigger simulations and they have to run (in chunks through a batch system) for weeks or even months until they have accumulated a sufficient amount of results.

Have you looked at bond style lepton or bond style table?

If you just want to change the bond length why not change bond_coeffs directly?

To know the amount of time spent, you need to do some proper profiling, but the timing summary at the end of a run can already be quite helpful.

1 Like

Thank you so much for the reply and advice!

Currently I’m running this on my personal computer. I have tried using MPI parallelization on a google cloud server to take advantage of the parallel aspects of lammps but found that using multiple cores actually slowed the simulation down (my best guess is that because I have some long range bonds in my system, the communication cutoffs must be very high relative to the size of the system).
I’m happy to attach an input deck shortly (my account is new so I am still unable to attach them right now).
I am not currently using any accelerators but I will definitely look into them, thank you for the advice.

I agree that dozens of hours by itself isn’t problematic, but I believe that it should be possible to reduce the speed of this simulation - I’ve noticed that if I grow all of the bonds with a single fix adapt, the time to run decreases significantly (by a factor of 20 or so). This unfortunately does not allow me to set the bonds individually.

I do hope to scale up further (to ~10^6 particles or more) - I have been gradually scaling this simulation up, and am now running into times that make it difficult to iterate.

To clarify, I want the bonds to change as a function of both the bond index and time (I’m currently using elaplong to get the timestep) - I’d want each bond to increase its length as the simulation runs. Bond style lepton, table, and changing the bond coefficients directly would allow me to set the bond lengths at the beginning of the simulation, but I don’t think I can use them to change the bond lengths gradually over the simulation (please correct me if I am wrong on this).

Including a timing summary below from a recent run. It looks like the modify section by far is taking the most time (which I think is likely to be the fix adapt method that I was referring to).

Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 1.1033     | 1.1033     | 1.1033     |   0.0 |  0.00
Bond    | 1836.7     | 1836.7     | 1836.7     |   0.0 |  2.45
Neigh   | 2.6846     | 2.6846     | 2.6846     |   0.0 |  0.00
Comm    | 2.9734     | 2.9734     | 2.9734     |   0.0 |  0.00
Output  | 11.64      | 11.64      | 11.64      |   0.0 |  0.02
Modify  | 73025      | 73025      | 73025      |   0.0 | 97.44
Other   |            | 62.77      |            |       |  0.08

Nlocal:           7361 ave        7361 max        7361 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:          33226 ave       33226 max       33226 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 33226
Ave neighs/atom = 4.5137889
Ave special neighs/atom = 6.9365575
Neighbor list builds = 379
Dangerous builds = 0
Total wall time: 21:00:28

Thanks once again for the help - I really appreciate it.

I am certain that your institution has a High-Performance Computing center with both proper HPC computing resources and competent HPC staff that can help you to run on those resources efficiently.
Also, for more demanding calculations, there is always the option to apply for time on a national supercomputing center (it may not be needed in your case) through the NSF sponsored ACCESS orginization: https://access-ci.org/

You can upload the files to a service like Dropbox, Google Drive, MS One Drive or similar and provide a (shareable) link.

It may be possible with bond style lepton, although a little trickery may be needed.

You are correct and this is unacceptable. The proper solution for the best performance would be a custom bond style. Let me think about it for a bit and also have a look at the examples using fix adapt.

What you are doing is a bit unusual, but unusual challenges are also interesting… :wink:

1 Like

One thing you should consider is rather than creating 7360 instances of fix adapt, you could try to do this with just one instance. For that you would have to write some python or similar script, since you cannot use the loop feature and then do:

include "myadapt.txt"

That file would first have the various variable definitions, then

fix bondadjust all adapt 5000 &
   bond harmonic r0 1 v_size1 &
   bond harmonic r0 2 v_size2 &
   bond harmonic r0 3 v_size3 &
   bond harmonic r0 4 v_size4 &
[...]
   bond harmonic r0 7360 v_size7360 &
   reset no

That may already help speeding things up.

1 Like

We definitely do - I’m planning to use them as we scale up. Currently, I am not getting any benefit from parallelization (probably because of this fix adapt issue), so increasing the number of cores wasn’t helpful. I hope to “grow” this system into a bed of granular media, at which point using the HPC center will be extremely helpful (I have already tested this in smaller scale and found that running on many cores helps a lot in this case).

Would it be ok to email you a version for now? I’m a little leery of posting my code publicly, given that it’s still unpublished. I am happy to make a simpler test case as an example and post it here once I find solutions so there’s a record of it for other users.

These both sound like solutions that would help tremendously.

I’m delighted to hear that! I was uncertain whether I missed something obvious, I’m glad this seems to be interesting.

This I’m sure will also help - I didn’t know you could specify multiple adapt types with a single fix. I’m in the process of implementing this such that my code has no more for loops - I’ll hopefully have this done today and will update here on how it impacts performance. Depending on how this improves things, I’ll see if the other possibilities you mentioned are necessary.

Again, thank you so much for the help! I really appreciate your time and expertise.

Yes, that would be OK in this case because there is a possibility for me to implement some new functionality into LAMMPS that may be of general interest beyond your needs.

But you should talk this over with your adviser, since If we take this private, it would become more of a collaboration and I would become somewhat of a consultant on your project and that would require that I expect to get the proper credit for that, too. :wink:

Otherwise, I would request to keep this public, which means that you would have to create a test case that still uses the functionality you need, but without it resembling your research project. For cases that do not tickle my interest, that would be the only option, but as I already mentioned, there seems to be some significant “hack value™” here.

I totally understand - I’ll start by creating this “minimal” test case and post it here along with trying the first solution of avoiding loops.

Happy to talk about the private option if needed, but I’m confident I can make something that will allow us to discuss the problem in this thread. I’ll follow up later today.

Hi,

Update on above - I’ve successfully made a minimal test case that shows the problem, and it seems like the above fix has significant effects.

The above is a LAMMPS code with 10,000 spheres particles on top vertically (atom style bond). These are connected with bonds sequentially (atom 1 links to atom 2, 2 links to 3, etc). The bonds are set to grow at different rates depending on their index as a ramp function.

I tested 3 cases, each accessible through a different commented code block: a single fix adapt command that affects all bonds in the same way (as a control test), the for loop implementation that I had described earlier, and the large fix adapt command that affects each bond separately.

In the control case, the modify time was 5 seconds (40% of runtime).
In the for loop implementation, the modify time was 20 seconds (70% of runtime).
In the large fix adapt case, the modify time was 7 seconds (43% of runtime).

Those are all on my personal device, but hopefully this comparison holds true across multiple machines.

I’ll work to implement this into the simulation I was describing earlier and update here if I see similar efficiency improvements (or don’t). It seems as though this may be a good enough solution for now (a more elegant lepton or custom bond style solution may be even better…I may attempt this and follow up if I do).

Attaching an input deck here if anyone wants to take a look at these tests (the files included are the main .lmp script, the atom file for the structure, and the include file for the mass adapt command). Let me know if there are any questions or concerns about this code.

https://drive.google.com/drive/folders/1szev8fbv0MdFTpmDogRW5wo-jGvjpMgD?usp=sharing

Thank you so much again for the help,
Aradhya Rajanala

By the way, is the syntax with the &s general? I haven’t seen anything like it in the LAMMPS documentation and it seems very useful to run the same command with different arguments on different groups.

The Google drive folder only has two data files and an include file, but is missing the main LAMMPS input.

Sorry! Uploaded the wrong file, should be correct now.

Thanks for the example input.
I have converted the “control” example to use bond style lepton, but it is even slower than the “loop” case with many fix adapt instances. The fact that in the “control” case you still see a significant time spent in Modify is due to the fact that you have fix nve and fix viscous and that the only force computation is the bond computation (which is generally fast). If you also had a pair style, then that would likely dominate (since it loops over all pairs within the cutoff, and not just the list of bonds). The difference between the “control” and the “single adapt” case is due to having to evaluated several thousand variable expressions (which is slow).

The fact that bond style lepton is so slow is that with your fix adapt settings, the expressions only need to be evaluated every 500 steps, which with bond style lepton,the bond force expression needs to be evaluated every step and that is why it is slower even though the individual Lepton expression evaluation is faster than the LAMMPS variable evaluation. For the case of having to have one expression per bond type, the Lepton evaluation will be even slower.

Bottom line, condensing the many fix adapt instances to the single instance is as much optimization as can be achieved for this use case, since fix adapt is updating the bond coefficients very gradually only once every 500 steps. That makes that command contribute very little to the total simulation time.
It is the (internal) looping over thousands of fix instances that limits the performance.

The ‘&’ character is a line continuation character (as in Fortran 90) so all this does is to combine multiple lines in the input into a single very, very long command. Line continuation is processed at the input pre-processing stage. See rule 1. on the following page of the manual: 5.2. Parsing rules for input scripts — LAMMPS documentation

Oof – if your bonds become long enough I’d imagine they will start interfering with efficient domain decomposition.

You should really consider either driving the bond-particles’ forces through an external script, or writing a custom fix. The fix would look like this:

  • Initialise:
    • read in a list of “bonds” connecting atoms. (With atom tags, each bond has an unambiguous smaller-tag atom and larger-tag atom)
    • set up memory for smaller-tag coordinates, larger-tag coordinates, bond lengths, and any other bond variables you need.
  • Per-step:
    • Gather all smaller-tag and larger-tag coordinates using MPI
    • Update bond lengths and variables
    • Each processor imposes forces as needed on their own atoms

A bit of work upfront will save you a lot of compute.

Extending pair style list may be less work. This was designed to handle very long bonds and thus get around the limitation of having all bonded pairs to be within the communication cutoff for each subdomain.

Hi,

Just wanted to send an update - it seems as though this fix works in my research cases and reduces the time to run to under an hour (less than 5% of the original runtime, with 50% of the time used for modify instead of 97.5%), which is even more than I’d hoped to see. Thank you so much for the help!

I’ll definitely keep these ideas in mind and ask if I have any questions on them after doing some research if they become relevant (especially extending pair style). I suspect that based on the scaling of my system, the final bond lengths that I’ll reach will still be comparable to the size of the granular particles (each particle in this case is an analog for a cell in the plant root, and the size of the largest bonds is likely to be no more than 5-10x the size of the granular material). If I am incorrect, I’ll definitely be looking into your recommendations in much greater detail.

Thank you very much once again,
Aradhya