Pinning Main Process to Specific Node

Greetings All!

A question came up recently regarding the behavior of lammps during I/O. As I understand it…lammps decides at execution time which node of the nodes supplied by the scheduler will be the “master” node of the allotment. If this is correct, then there is one node that is doing the data collection from the other nodes prior to writing it out to disk? If so, then would there be a significant benefit to having the master node of the job run on an I/O rich node that is designed to handle such large bursts of data. And, if so, is it possible to select the master node during execution time so that the I/O rich node does the job for which it is intended.

Best,

hi jeff,

Greetings All!
A question came up recently regarding the behavior of lammps during I/O. As
I understand it...lammps decides at execution time which node of the nodes
supplied by the scheduler will be the "master" node of the allotment. If
this is correct, then there is one node that is doing the data collection
from the other nodes prior to writing it out to disk? If so, then would
there be a significant benefit to having the master node of the job run on
an I/O rich node that is designed to handle such large bursts of data. And,
if so, is it possible to select the master node during execution time so
that the I/O rich node does the job for which it is intended.

most of the I/O is done on the MPI rank 0. there are special cases
(parallel I/O, multi-partition runs) where multiple or all nodes have
I/O operations.

which node gets assigned rank 0, is usually decided by the MPI library.
most commonly through the order of nodes in the node file (if you use
that) or through the order in which nodes/cores are listed in the batch
system allocation (if you launch through Torque's TM library, for example).

due to the way how LAMMPS works, it would be possible to
program a "shim" by creating a new communicator that would
swap MPI ranks around, so that rank 0 would be located on
the preferred node (if there is a way to identify that, of course).

cheers,
     axel.

Thanks Axel! That’s what I wanted to know :wink:

I’ll look into this a bit further and report back to thre list if anything promising comes of it!

Cheers!

jeff,

one additional piece of information:

in case you are running OpenMPI (and you really should),
then you can debug the MPI rank assignments even with
a simple shell script or through printenv.

OpenMPI assigns $OMPI_COMM_WORLD_RANK
to each process and you can use
$OMPI_COMM_WORLD_LOCAL_RANK to tell the
"relative" rank of an MPI task on any
given physical node.

just try: mpirun printenv | grep OMPI_

cheers,
     axel.