LAMMPS gives error with OPENMP package

Greetings,

I am trying to run lammps with OMP package enabled. However I face the following error on supercomputer:
[mpiexec@qbc020] fn_kvs_get (pm/pmiserv/pmiserv_pmi_v2.c:299): assert (idx != -1) failed
[mpiexec@qbc020] handle_pmi_cmd (pm/pmiserv/pmiserv_cb.c:49): PMI handler returned error
[mpiexec@qbc020] control_cb (pm/pmiserv/pmiserv_cb.c:286): unable to process PMI command
[mpiexec@qbc020] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@qbc020] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
[mpiexec@qbc020] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion

I compiled the lammps by commenting the “MPI_BARIIER” in timer.cpp to get asynchronous parallelization. This is my batch file content:

#!/bin/bash
#SBATCH -N 20 # request two nodes
#SBATCH -n 120 # specify 16 MPI processes (8 per node) # specify 6 threads per process
#SBATCH -t 10:00:00
#SBATCH -c 8
#SBATCH -p workq
#SBATCH -A myAllocation
#SBATCH -o slurm-CH4_wat_MPIOnly.out # optional, name of the stdout, using the job number (%j) and the first node (%N)
#SBATCH -e slurm-CH4_wat_MPIOnly.err # optional, name of the stderr, using job and first node values
#module load cuda/10.2.89/intel-19.0.5
#module load lammps/20200303
module load mpich/3.3.2/intel-19.0.5
#module load openmpi/4.0.3/intel-19.0.5

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
#mpirun -np 48 lmp -sf gpu -pk gpu 2 -in runHydr.in
echo $SLURM_NPROCS
echo $OMP_NUM_THREADS

mpirun -np $SLURM_NPROCS /myFolder/build_mpiOpenMP_SyncTiming/lmp -sf omp -pk omp $OMP_NUM_THREADS -in runHydr.in

date

Sorry but this is impossible to provide any meaningful suggestions to this. For multiple reasons:

  • it is impossible to reproduce based on the provide information
  • you do not provide details about the specific version of LAMMPS you are using
  • there is no proof this is caused directly or indirectly by the OPENMP package
  • the MPI_Barrier calls in timer.cpp are required. LAMMPS already minimizes the number of synchronizations needed. additional synchronizations to get more reliable timing can be enabled with a “timer sync” command
  • you didn’t explain what you want to achieve specifically, why this is important for your simulation and on what basis you were making the modifications you did

I strongly suspect that you are subject to the GI-GO rule and are trying to resolve the wrong problem because you are making wrong assumptions about what is causing the issues you want to resolve.