Computational optimization questions

_Giera_Brian · May 15, 2019, 3:19pm

Hello,

I’m dusting off an old model to tackle a new problem within a different parameter space. Similarly, I am re-familiarizing myself with LAMMPS after a several year break and have some broad questions. In particular, I’m looking for new and/or useful knobs to turn that I am not aware of to optimize my computations. If this is not the correct venue, I apologize in advance as this is partly a research question.

Full disclosure, I’m developing this code while using 2-4 processors on a laptop and plan to scale it to more processors to generate production data.

I am simulating ~600 colloids in external fields that cause colloids to accumulate at a wall. I use the DLVO potential (e.g. pair_styles colloid, yukawa/colloid, lubricate, and Brownian) and fixes to account for the external driving forces (e.g. addforce and wall/colloid). At the beginning of the simulation, particles are evenly distributed at low volume fraction within a high aspect ratio box (very tall, very thin). Towards the middle and end of the simulation, particles concentrate at the wall at moderate volume fractions.

I ran a series of simulations at ridiculously high driving forces to determine a suitable timestep. I then plan to use the largest allowable timestep in these physically difficult trial simulations when simulating the actual more gentle driving forces of interest.

At ridiculously high driving forces, I settled on a timestep of dt = 1e-13 seconds. However, these simulations appear to slow down considerably as the simulation progresses. For instance, the CPU time averages to (~300 seconds/1e6 steps) from 0-10e6 steps and averages to (~5000 seconds/1e6 steps) from 20-28e6 steps. For these conditions, ~30% of particles are at the wall (70% are suspended) at later times, compared to all particles being evenly suspended at beginning of simulation. Restarting the simulation (i.e. rebalancing) doesn’t help. Also, the slowdown is considerably worse if I attempt larger timesteps, which I don’t understand.

Definitely CPU time/step will decrease with more processors and better computer, but I feel there’s other optimization tricks I can implement with the modest computational resources I am using at the moment. I’d appreciate any input on the following:

akohlmey · May 15, 2019, 5:44pm

Hello,

I’m dusting off an old model to tackle a new problem within a different parameter space. Similarly, I am re-familiarizing myself with LAMMPS after a several year break and have some broad questions. In particular, I’m looking for new and/or useful knobs to turn that I am not aware of to optimize my computations. If this is not the correct venue, I apologize in advance as this is partly a research question.

Full disclosure, I’m developing this code while using 2-4 processors on a laptop and plan to scale it to more processors to generate production data.

I am simulating ~600 colloids in external fields that cause colloids to accumulate at a wall. I use the DLVO potential (e.g. pair_styles colloid, yukawa/colloid, lubricate, and Brownian) and fixes to account for the external driving forces (e.g. addforce and wall/colloid). At the beginning of the simulation, particles are evenly distributed at low volume fraction within a high aspect ratio box (very tall, very thin). Towards the middle and end of the simulation, particles concentrate at the wall at moderate volume fractions.

I ran a series of simulations at ridiculously high driving forces to determine a suitable timestep. I then plan to use the largest allowable timestep in these physically difficult trial simulations when simulating the actual more gentle driving forces of interest.

At ridiculously high driving forces, I settled on a timestep of dt = 1e-13 seconds. However, these simulations appear to slow down considerably as the simulation progresses. For instance, the CPU time averages to (~300 seconds/1e6 steps) from 0-10e6 steps and averages to (~5000 seconds/1e6 steps) from 20-28e6 steps. For these conditions, ~30% of particles are at the wall (70% are suspended) at later times, compared to all particles being evenly suspended at beginning of simulation. Restarting the simulation (i.e. rebalancing) doesn’t help. Also, the slowdown is considerably worse if I attempt larger timesteps, which I don’t understand.

Definitely CPU time/step will decrease with more processors and better computer, but I feel there’s other optimization tricks I can implement with the modest computational resources I am using at the moment. I’d appreciate any input on the following:

For only 600 particles at these high driving forces, these CPU times seems quite slow, either at beginning or end, even considering how few processors I am running on. Maybe running at lower driving forces would fix this, although I don’t see how.

you have to look at the load imbalance. you may have MPI ranks that have little to no work. you also need to check the distribution of time over the different parts of the computation in the post-run summary. that can help looking for where you need to optimize (e.g. running neighborlist construction more or less often or using single vs. multi cutoff neighbor list construction).

Based on forum posts, for only 600 particles, I don’t think running on GPUs will help.

correct. (moderate) threading might help, as it decomposes over particles and not over space. but there is growing overhead with increasing number of threads (for USER-OMP styles) and the Amdahl’s law (the amount of non-threaded code determines how much speedup you can get).

It feels like “balance” might help with this. Maybe playing with “neighbor” can help a little. Anything else?

before looking at balance, i would look at the processors commands. for slab/wall configurations with periodicity in x and y, often using “processors * * 1” takes care of the bulk of load imbalance.

however, you also have to factor in, that your computational effort will grow, as particles cluster together, since the number of neighbors will increase (break down your run into multiple parts and monitor the reported average number of neighbors after each part in the post-run summary). that part is just the physics of the problem.

Finally, I might reconsider my simulation approach. For example, I can run a Monte Carlo simulation to find particle configurations for a “partially deposited” system. Then run traditional molecular dynamics using this starting point to reduce overall simulation time. Can “minimize” be used for this using these colloid potentials? However, from what I read, the angular momentum may not be accounted for during MC optimization.

i cannot help with this. sorry.

axel.