LAMMPS run time error

Hi,
I am running a LAMMPS test on an Intel cluster.
For the same test, I can sucessfully finish runs uner 64 cores.
But when I run with 128/256 cores, I would got following error.
Could anyone give some help on this? Many thanks

PPPM initialization …
G vector (1/distance)= 0.248835
grid = 25 32 32
stencil order = 5
estimated absolute RMS force accuracy = 0.0355478
estimated relative force accuracy = 0.000107051
using double precision FFTs
3d grid and FFT values/proc = 972 112
Setting up run …
Fatal error in MPI_Allreduce: Message truncated, error stack:
MPI_Allreduce(1890)…: MPI_Allreduce(sbuf=0x7fff18285c60, rbuf=0x200dd30, count=1, MPI_INT, MPI_MAX, MPI_COMM_WORLD) failed
MPIR_Allreduce(294)…:
MPIR_Reduce(1039)…:
MPIR_Reduce_binomial(172)…:
MPIDI_CH3_PktHandler_EagerShortSend(356): Message from rank 1 and tag 11 truncated; 8 bytes received but buffer size is 4