Hi all,
I encountered the following error during a simulation of granular clumps undergoing gravity-driven free fall. The simulation was run in MPI+OpenMP hybrid mode, executed with:
export OMP_NUM_THREADS=2
export OMP_PLACES=cores
export OMP_PROC_BIND=close
mpirun -np 48 \
--map-by ppr:12:numa:pe=2 \
--bind-to core \
./lmp -sf omp -pk omp 2 -in in.AK1
Error message:
[dell7875-Precision-7875-Tower:00000] *** An error occurred in MPI_Waitany
[dell7875-Precision-7875-Tower:00000] *** reported by process [2109276161,21]
[dell7875-Precision-7875-Tower:00000] *** on communicator MPI_COMM_WORLD
[dell7875-Precision-7875-Tower:00000] *** MPI_ERR_TRUNCATE: message truncated
[dell7875-Precision-7875-Tower:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dell7875-Precision-7875-Tower:00000] *** and MPI will try to terminate your MPI job as well)
The simulation runs without issue in pure MPI mode (i.e., with no OpenMP threads).
I’ve attached the input script and a short animation of the simulation to help illustrate the clump setup.

I found a related post here:
but I’m not sure whether it’s directly relevant or if the issue has since been resolved.
Any ideas about the root cause of this error in the hybrid setup? Could it be related to rigid body communication, memory alignment across threads, or known issues with the rigid/small fix?
Thanks in advance for your help!
System Information
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Vendor ID: AuthenticAMD
Model name: AMD Ryzen Threadripper PRO 7995WX 96-Cores
CPU family: 25
Model: 24
Thread(s) per core: 1
Core(s) per socket: 96
Socket(s): 1
Stepping: 1
Frequency boost: enabled
CPU(s) scaling MHz: 12%
CPU max MHz: 5187.0000
CPU min MHz: 545.0000
BogoMIPS: 4992.50
Caches (sum of all):
L1d: 3 MiB (96 instances)
L1i: 3 MiB (96 instances)
L2: 96 MiB (96 instances)
L3: 384 MiB (12 instances)
NUMA:
NUMA node(s): 4
NUMA node0 CPU(s): 0-7,32-39,64-71
NUMA node1 CPU(s): 16-23,48-55,80-87
NUMA node2 CPU(s): 24-31,56-63,88-95
NUMA node3 CPU(s): 8-15,40-47,72-79
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Mitigation; Safe RET
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eI
BRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
Memory Configuration (numactl --hardware):
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39 64 65 66 67 68 69 70 71
node 0 size: 128131 MB
node 0 free: 22513 MB
node 1 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 80 81 82 83 84 85 86 87
node 1 size: 128971 MB
node 1 free: 127394 MB
node 2 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63 88 89 90 91 92 93 94 95
node 2 size: 129015 MB
node 2 free: 126950 MB
node 3 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47 72 73 74 75 76 77 78 79
node 3 size: 128968 MB
node 3 free: 125703 MB
node distances:
node 0 1 2 3
0: 10 12 12 12
1: 12 10 12 12
2: 12 12 10 12
3: 12 12 12 10
LAMMPS Build Info:
Large-scale Atomic/Molecular Massively Parallel Simulator - 4 Feb 2025 - Development
Git info (develop / patch_4Feb2025-105-gaaa81b2576)
OS: Linux "Ubuntu 24.04.2 LTS" 6.11.0-21-generic x86_64
Compiler: Clang C++ AMD Clang 17.0.6 (CLANG: AOCC_5.0.0-Build#1377 2024_09_24) with OpenMP 5.1
C++ standard: C++17
Embedded fmt library version: 10.2.0
MPI v3.1: Open MPI v5.0.6, package: Open MPI dell7875@dell7875-Precision-7875-Tower Distribution, ident: 5.0.6, repo rev: v5.0.6, Nov 15, 2024
Accelerator configuration:
OPENMP package API: OpenMP
OPENMP package precision: double
OpenMP standard: OpenMP 5.1
FFT information:
FFT precision = double
FFT engine = mpiFFT
FFT library = KISS
Active compile time flags:
-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_JPEG
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint): 32-bit
sizeof(bigint): 64-bit
Installed packages:
EXTRA-FIX GRANULAR MOLECULE OPENMP PYTHON RIGID VTK
List of individual style options included in this LAMMPS executable
Script in.AK1
# 1. setup variables and calculate critical time steps
units si
variable fileOrigin universe in.AK1 ## input value
jump subIn.subCal l_variables # Call subscript for calculate void ratio
label l_variableMain
timestep ${dt}
#timestep 1E-6
# 2. setup simulation environments
#newton on
newton off
boundary f f f
dimension 3
variable skinD equal 5E-4
variable forceCutoff equal 5E-4 # Radius X 2
variable neighCutoff equal ${forceCutoff} #+${skinD}
variable commCutoff equal ${skinD}/2
atom_style hybrid sphere molecular
atom_modify map array sort 1000 ${skinD} # Must be declared before simulation box definition
neighbor ${neighCutoff} bin
neigh_modify delay 0 every 20 check yes once no cluster no exclude molecule/intra all # page 10000000 one 200000
#comm_style brick
comm_style tiled
comm_modify mode single group all vel yes cutoff ${commCutoff}
#processors 4 4 1 numa_nodes 4
processors * * * numa_nodes 4
# 2.1 get current date and time
python timeString return v_strvar format s here """
def timeString():
import datetime
return datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
"""
variable strvar python timeString
python timeString invoke
variable date0S string ${strvar} # initial time in string
python timeFloat return v_fltvar format f here """
def timeFloat():
import time
return time.time()
"""
variable fltvar python timeFloat
python timeFloat invoke
variable date0F equal ${fltvar} # initial time in float
#print "Float timestamp = ${fltvar}"
variable elapsedTime equal (v_fltvar-v_date0F)
# 3. create box
variable domainXlo equal -1.5*${dimB}/2
variable domainXhi equal 1.5*${dimB}/2
variable domainYlo equal -1.5*${dimL}/2
variable domainYhi equal 1.5*${dimL}/2
variable domainZlo equal -1.5*${dimH}/2
variable domainZhi equal 4.0*${dimH}/2
region domain_3D block ${domainXlo} ${domainXhi} ${domainYlo} ${domainYhi} ${domainZlo} ${domainZhi}
create_box 2 domain_3D # Use 2 type of atoms
# 4. setup walls
#variable sway equal 0.0
#variable rotate equal 0.0
region bottom_plate plane 0.0 0.0 ${zminB} 0.0 0.0 1.0 side in # move v_sway NULL NULL
region moving_plateL plane ${xminB} 0.0 0.0 1.0 0.0 0.0 side in # move v_sway NULL NULL rotate v_rotate ${xminB} 0.0 ${zminB} 0 1 0
region moving_plateR plane ${xMaxB} 0.0 0.0 -1.0 0.0 0.0 side in # move v_sway NULL NULL rotate v_rotate ${xMaxB} 0.0 ${zminB} 0 1 0
fix bottom all wall/gran/region hertz/history ${kN} NULL ${gammaN} ${gammaT} 0.5 1 region bottom_plate #contacts
fix mL all wall/gran/region hertz/history ${kN} NULL ${gammaN} ${gammaT} 0.5 1 region moving_plateL #contacts
fix mR all wall/gran/region hertz/history ${kN} NULL ${gammaN} ${gammaT} 0.5 1 region moving_plateR #contacts
fix yside_plate all wall/gran hertz/history ${kN} NULL ${gammaN} ${gammaT} 0.5 1 yplane ${yminB} ${yMaxB} #contacts
# 5. Read clumps and setup pairs, EQ of motions
jump subIn.msData l_readClumps # Call subscript for calculate void ratio
label l_RCMain
variable genBoxXlo equal ${xminB}+${MaxClumpDia}
variable genBoxXhi equal ${xMaxB}-${MaxClumpDia}
variable genBoxYlo equal ${yminB}+${MaxClumpDia}
variable genBoxYhi equal ${yMaxB}-${MaxClumpDia}
variable genBoxZlo equal 0.5*${domainZhi}
variable genBoxZhi equal 0.7*${domainZhi}
region gen_area block ${genBoxXlo} ${genBoxXhi} ${genBoxYlo} ${genBoxYhi} ${genBoxZlo} ${genBoxZhi} # Set pouring Space
fix make_clumps_1 all rigid/small molecule mol clumps_01 gravity grav_acc reinit no
compute adjust_DOF all temp/sphere
thermo_modify temp adjust_DOF
pair_style gran/hertz/history ${kN} ${kT} ${gammaN} ${gammaT} 0.5 1
pair_coeff * *
group temp_rigid empty
fix grav_acc temp_rigid gravity 9.81 vector 0.0 0.0 -1.0
fix viscous_damping all viscous 0.0001
# 6. Setup dump
shell if [ -d "post_1" ]; then rm -rf post_1; fi # make directory for post
shell mkdir post_1
dump dump_atoms all vtk ${screenNstep} post_1/atoms*.vtk fx fy fz xu yu id type radius diameter mol x y z vx vy vz
# 7. Pouring
compute compute_atom_vzmax all reduce min vz # Compute the z-velocity component for all atoms and select the maximum
compute compute_atom_zmax all reduce max z # Compute the z-position for all atoms and select the maximum
variable runStep equal 0
variable accNinserts equal 0 # Accumulated number of the inserted clumps
variable runTime equal 0
variable nPour equal f_pour_clumps1
variable atomVzmax equal abs(c_compute_atom_vzmax) # Get the largest vertical absoulute velocity of atoms
variable atomZmax equal c_compute_atom_zmax
variable runTime equal ${runTime}+cpu
variable nStep equal step
variable stepPerf equal v_nStep/v_elapsedTime
#fix loadBalance all balance ${screenNstep} 0.9 shift zxy 20 1.1 out info.balancing
fix loadBalance all balance ${screenNstep} 1.01 rcb out info.balancing
timer full
variable printNstep equal v_screenNstep*50
label loopPluviation
variable indexP loop 30
print " "
print "==============================="
print "Pouring stage ${indexP} / 30"
print "==============================="
print " "
fix pour_clumps1 all pour 500 0 4767548 region gen_area mol clumps_01 molfrac 0.05 0.05 0.05 0.05 0.05 &
0.05 0.05 0.05 0.05 0.05 &
0.05 0.05 0.05 0.05 0.05 &
0.05 0.05 0.05 0.05 0.05 rigid make_clumps_1
variable runStep equal ${runStep}+${screenNstep}
run ${runStep} upto
variable accNinserts equal ${accNinserts}+${nPour}
unfix pour_clumps1
print " "
print "==============================="
print "Settling stage ${indexP} / 30"
print "==============================="
print " "
label freeFall
print " "
print "=============================================================="
print "Atoms are still falling, continue freeFall, Atom_Vzmax is - $(v_atomVzmax:%8.5f) m/s"
print "=============================================================="
print " "
fix settling_print1 all print ${printNstep} "Start at:${date0S}, Step:${nStep}, ElapsedTime:$(v_elapsedTime:%8.0f), Steps/sec:$(v_stepPerf:%8.2f)"
fix settling_print2 all print ${printNstep} " Inserted so far:${accNinserts}, Atom_vzmax:$(v_atomVzmax:%8.5f), Atom_zmax:$(v_atomZmax:%8.5f), Atom_Zthresh:$(v_atomZthreshold:%8.5f)"
variable runStep equal ${runStep}+${screenNstep}*100
run ${runStep} upto
# unfix settling_print
if "(${atomZmax} > ${atomZthreshold}) && (${atomVzmax} < 0.001)" then "jump SELF stopPluviation" &
elif "(${atomZmax} > ${atomZthreshold}) && (${atomVzmax} > 0.001)" "jump SELF freeFall"
next indexP # Next step for Pouring
jump SELF loopPluviation
label stopPluviation
variable domainZhiR equal v_atomZmax + 5*${MaxClumpDia}
change_box all z final ${domainZlo} ${domainZhiR}
run 100000
write_restart restart.pour.CDSS
write_data data.pour.CDSS
250403.zip (256.8 KB)