OpenMP: locally OK, apparently inactive on MPI remote nodes

Hello,

I'm trying to use the USER-OMP package and found that OpenMP is active on the first physical MPI node only, but not on remote MPI nodes, or rather, the CPU usage hovers at or only slightly above 100% where I expect 300%. See below for an example, testing with examples/shear/in.shear .

I set OMP_NUM_THREADS and pass it via mpirun -x. I verified that remote MPI processes see this value.

I'm using: distro EL5.7, Intel compilers 11.1.073, OpenMPI-1.4.4

With best regards,
Michael

---- Compilation -------------------------------------------
packages=(
         no-asphere
        yes-class2
        yes-colloid
         no-dipole
         no-fld
         no-gpu
         no-granular
         no-kim
        yes-kspace
        yes-manybody
        yes-mc
        yes-meam
        yes-molecule
         no-opt
         no-peri
        yes-poems
        yes-reax
        yes-replica
         no-shock
         no-srd
         no-xtc

         no-user-awpmd
         no-user-cg-cmm
         no-user-cuda
         no-user-eff
         no-user-ewaldn
         no-user-sph
        yes-user-atc
        yes-user-misc
        yes-user-reaxc

        # must come last -- doc/Section_accelerate.html
        yes-user-omp
)

    make -C src ${packages[@]}

    model_opt="-mcmodel=medium"

    make -C lib/atc -j 6 -f Makefile.icc CC="mpicc $model_opt"
    # awpmd
    # cuda
    # gpu
    # kim
    # linalg - not needed if we use BLAS/LAPACK
    make -C lib/meam -j 1 -f Makefile.ifort F90="ifort $model_opt"
    make -C lib/poems -j 6 -f Makefile.icc CC="icc $model_opt"
    make -C lib/reax -j 6 -f Makefile.ifort F90="ifort $model_opt"

    # NB: it's non-canonically "CCFLAGS", not "CFLAGS"
    make -C src -j 1 \
        CC="mpic++" \
          CCFLAGS="$model_opt -openmp -O2 -funroll-loops -fstrict-aliasing" \
        LINKFLAGS="$model_opt -openmp" \
        LINK=$CC \
        $ARCH

   make -C tools all

Hello,

I'm trying to use the USER-OMP package and found that OpenMP is active on the first physical MPI node only, but not on remote MPI nodes, or rather, the CPU usage hovers at or only slightly above 100% where I expect 300%. See below for an example, testing with examples/shear/in.shear .

I set OMP_NUM_THREADS and pass it via mpirun -x. I verified that remote MPI processes see this value.

I'm using: distro EL5.7, Intel compilers 11.1.073, OpenMPI-1.4.4

please check whether OpenMPI is configured with processor
affinity enabled by default and then change it to be per socket
or disable it.

second, use the -npernode flag (-bynode might work as well)
to mpirun to make sure you get the proper placement of MPI
tasks across the nodes. if mpirun places N mpi tasks on a
node with N cpu cores, then it will show 100% CPU, even
if there are multiple threads running.

cheers,
    axel.

Solved. -- Was trying to be too clever.

Cause was that I was using a MOAB feature <http://www.adaptivecomputing.com/resources/docs/mwm/6-0/5.3nodeaccess.php> to reserve entire nodes (with typically 8 physical cores), yet get a pre-thinned PBS_NODEFILE which has node names only ppn times:

  #!/bin/bash
  #PBS -l nodes=2:ppn=2
  #PBS -l naccesspolicy=SINGLEJOB
  ...

  # grab first (and usually only) ppn value of the job
  ppn_mpi=$( uniq -c $PBS_NODEFILE | awk '{print $1; exit}' )

  # number of cores on first execution node
  ppn_phys=8

  # calculate number of threads available per MPI process
  export OMP_NUM_THREADS=$(( ppn_phys / ppn_mpi ))

  mpirun -x OMP_NUM_THREADS \
    -machinefile $PBS_NODEFILE \
          -np $ppn_mpi \
    …

This apparently made OpenMPI stingy on remote nodes, permitting the use of only ppn cores as shown in $PBS_NODEFILE, and no more. (A defensible position, actually.)

Solution: request nodes frpm PBS with ppn=max and pass to mpirun only -npernode indeed:

  #!/bin/bash
  #PBS -l nodes=2:ppn=8
  …

  ppn_pbs=$( uniq -c $PBS_NODEFILE | awk '{print 1; exit\}&#39; \) &nbsp;&nbsp;ppn\_mpi=2 \# user choice &nbsp;&nbsp;export OMP\_NUM\_THREADS=(( ppn_pbs / ppn_mpi ))

  mpirun -x OMP_NUM_THREADS \
    -machinefile $PBS_NODEFILE \
          -npernode $ppn_mpi \
      lmp_openmpi \
          -sf omp -in in.AB

Below is a sample ps(1) output with threads expanded. The CPU value per thread are now near 100, as expected.

Best,
Michael

# primary node (ranks 0 and 1)
$ ssh n282 psuser -- -m
USER PID S STARTED TIME %CPU %MEM VSZ RSS COMMAND
stern 19149 - 14:43:36 00:00:00 0.0 0.0 8724 1060 /bin/bash /var/spool/torque/mom_priv/jobs/219751.sched1.carboncluster.SC
stern - S 14:43:36 00:00:00 0.0 - - - -
stern 19153 - 14:43:36 00:00:00 0.0 0.0 58484 2460 mpirun -x OMP_NUM_THREADS -machinefile /var/spool/torque/aux//219751.sch
stern - S 14:43:36 00:00:00 0.0 - - - -
stern 19157 - 14:43:37 00:02:37 393 0.2 4153812 51972 lmp_openmpi -sf omp -in in.AB
stern - R 14:43:37 00:00:39 98.9 - - - -
stern - S 14:43:37 00:00:00 0.0 - - - -
stern - S 14:43:37 00:00:00 0.0 - - - -
stern - S 14:43:37 00:00:00 0.0 - - - -
stern - R 14:43:37 00:00:39 98.3 - - - -
stern - R 14:43:37 00:00:39 98.0 - - - -
stern - R 14:43:37 00:00:39 98.3 - - - -
stern 19158 - 14:43:37 00:02:37 393 0.2 4143680 50296 lmp_openmpi -sf omp -in in.AB
stern - R 14:43:37 00:00:39 99.1 - - - -
stern - S 14:43:37 00:00:00 0.0 - - - -
stern - S 14:43:37 00:00:00 0.0 - - - -
stern - S 14:43:37 00:00:00 0.0 - - - -
stern - R 14:43:37 00:00:39 98.2 - - - -
stern - R 14:43:37 00:00:39 97.8 - - - -
stern - R 14:43:37 00:00:39 98.2 - - - -

# secondary node (ranks 2 and 3)
$ ssh n276 psuser -- -m
USER PID S STARTED TIME %CPU %MEM VSZ RSS COMMAND
stern 12404 - 14:43:38 00:00:00 0.0 0.0 52196 1472 /opt/soft/openmpi-1.4.4-intel11-1/bin/orted --daemonize -mca ess env -mc
stern - S 14:43:38 00:00:00 0.0 - - - -
stern 12405 - 14:43:38 00:00:47 399 0.2 4138648 50240 lmp_openmpi -sf omp -in in.AB
stern - R 14:43:38 00:00:12 100 - - - -
stern - S 14:43:38 00:00:00 0.0 - - - -
stern - S 14:43:38 00:00:00 0.0 - - - -
stern - S 14:43:38 00:00:00 0.0 - - - -
stern - R 14:43:38 00:00:11 99.5 - - - -
stern - R 14:43:38 00:00:11 99.4 - - - -
stern - R 14:43:38 00:00:11 99.5 - - - -
stern 12406 - 14:43:38 00:00:47 399 0.2 4137728 49700 lmp_openmpi -sf omp -in in.AB
stern - R 14:43:38 00:00:12 100 - - - -
stern - S 14:43:38 00:00:00 0.0 - - - -
stern - S 14:43:38 00:00:00 0.0 - - - -
stern - S 14:43:38 00:00:00 0.0 - - - -
stern - R 14:43:38 00:00:11 99.4 - - - -
stern - R 14:43:38 00:00:11 99.5 - - - -
stern - R 14:43:38 00:00:11 99.4 - - - -