Settings for comm_style and comm_modify to optimize single chain simulations


Sorry, I looked but could not definitively find anywhere …

Is there an established scheme (or knowledge base or an example) for setting comm_style and comm_modify to optimize speed in single chain simulations?

This is assuming there are bond, angle and nonbonds at a minimum for potentials.

Or is it always faster to run on a single processor?

I would think by this time there would be some conventional wisdom we might draw from but it is not readily apparent.



I’m confused by what you mean for the following: “is it always faster to run on a single processor?”

OK, sorry, I skipped over some info. That comment came from some tests with additional processors in which the simulation slowed down with the increase in the number of processors, presumably due to poor communication that occurs when some grids have no atoms. The single case has generally been the fastest. That seems to indicate that some comm_modify change is needed.


If you have an inhomogeneous distribution of mass or force calculations its probably advantageous for you to use comm style tiled:

comm_style command — LAMMPS documentation
Description. This command sets the style of inter-processor communication of atom information that occurs each timestep as coordinates and other properties are exchanged between neighboring processors and stored as properties of ghost atoms.

in tandem with fix balance to redistribute subdomains according to cost (maybe use a variable option with weight for fix balance) :

How long a chain are you talking about here?


Thank you for sending that. Looks like fix balance runs on the fly which is what is needed … comm_modify does not reference that. That would be probably the next thing to try. There is probably overhead with that but it is worth a shot.



As small as 10, but nominally 100-400. In any event, the box is mostly empty.


As small as 10, but nominally 100-400. In any event, the box is mostly empty.

with next to no atoms, there is very little work to do and thus nothing to parallelize over that would offset the overhead of communication. the typical amount of work in classical MD is dominated by pairwise interactions, but that scales with the effort per-pair of atoms times the typical number of neighbors per atom times the number of atoms. so for low density systems, you have even much less work, that for typical dense systems. unless you are applying a very expensive potential, there is no chance to see any parallelization speedup.

please recall amdahl’s law, that states, the maximal speedup is determined by the non-parallel part of a code, and that is not even considering the additional overhead from communication and synchronization and computing what to put where.

why do you want to parallelize calculations, that are very fast as they are? look at the output of your calculations. LAMMPS typically prints a summary of where in the code the time is spent.