Meaning of "other" reported timing when running in parallel

Good morning LAMMPS community,

  1. I am studying the parallel performance of my system (in one single processor, different number of nodes per processor) and I would like to know what “other” accounts for, and if there is any way to disclose/itemize that information in the output file.

  2. If I perform the run selecting only 1 processor, I still have some time under “communication” between nodes… although it is a small number this is surprising to me! I was expecting a 0.0000. I am using openmpi.

Thanks in advance!
Cecilia Bores
Physics and Astronomy
Union College

PS: There are so many threats about parallel running that I doubt this is an original question… My apologies! I spent a fair amount of time looking for this thing and didn’t find the answer.

It is the difference between the sum of the time tallied into the other categories and the total wall time.
Please see: 4.3. Screen and logfile output — LAMMPS documentation

No. If you want more detail (e.g. function level accounts) you need to use an external profiling tool or use instrumented code. On Linux there is the kernel level built-in profiling with perf. But there are also very sophisticated packages using instrumented code like Tau.

Yes, that is intentional. LAMMPS treats the case of 1 MPI rank the same way as multiple MPI ranks and handles, e.g. periodic boundary conditions as an extension of its domain decomposition scheme.
Instead of communicating buffers across processors you just pack, copy, and unpack them on the same CPU. That takes time, too.
You can read more about this in the manual at: 4.4. Parallel algorithms — LAMMPS documentation and in the publications describing the parallelization approach in LAMMPS (https://lammps.org/cite.html).

Thank you very much for the fast answer and the references!

Cecilia