Problem with comm->forward_comm_fix

Dear all,

I have developed a fix style for polarization. The code is compiled successfully. But I encountered such error when running.

Fatal error in PMPI_Wait: Message truncated, error stack:
PMPI_Wait(183)…: MPI_Wait(request=0x7fff40577328, status=0x1) failed
MPIR_Wait_impl(77)…:
MPIDI_CH3U_Receive_data_found(129): Message from rank 2 and tag 0 truncated; 1440 bytes received but buffer size is 0
Fatal error in PMPI_Wait: Message truncated, error stack:
PMPI_Wait(183)…: MPI_Wait(request=0x7fff1297d4a8, status=0x1) failed
MPIR_Wait_impl(77)…:
MPIDI_CH3U_Receive_data_found(129): Message from rank 0 and tag 0 truncated; 2400 bytes received but buffer size is 0
Fatal error in PMPI_Wait: Message truncated, error stack:
PMPI_Wait(183)…: MPI_Wait(request=0x7fffdde581a8, status=0x1) failed
MPIR_Wait_impl(77)…:
MPIDI_CH3U_Receive_data_found(129): Message from rank 3 and tag 0 truncated; 864 bytes received but buffer size is 0
Fatal error in PMPI_Wait: Message truncated, error stack:
PMPI_Wait(183)…: MPI_Wait(request=0x7fffcc8348a8, status=0x1) failed
MPIR_Wait_impl(77)…:
MPIDI_CH3U_Receive_data_found(129): Message from rank 1 and tag 0 truncated; 1440 bytes received but buffer size is 0

The program stops due to comm->forward_comm_fix. The code for packing and unpacking are the following:

int FixPolar::pack_forward_comm(int n, int ilist, double buf, int pbc_flag, int pbc)
{
int m = 0;
switch (packflag) {
case POLMU:
for (int i = 0; i < n; i++) {
int iatom = ilist[i];
int piatom = apolflag[iatom];
if ( piatom != -1) {
buf[m++] = Min.x[piatom
3];
buf[m++] = Min.x[piatom
3+1];
buf[m++] = Min.x[piatom
3+2];
}
}
break;
}
return m;
}

void FixPolar::unpack_forward_comm(int n, int first, double buf)
{
int m = 0;
int last = first+n;
switch (packflag) {
case POLMU:
for (int i = first; i < last; i++) {
int piatom = apolflag[i];
if (piatom != -1) {
Min.x[piatom
3] = buf[m++];
Min.x[piatom3+1] = buf[m++];
Min.x[piatom
3+2] = buf[m++];
}
}
break;
}
}

Is there anyone who has some idea? I search on google and some one says it is because of the insufficient buf or too much data to send. I don’t think it’s such kind of error.

Thanks in advance.

Han

Dear all,

I have developed a fix style for polarization. The code is compiled
successfully. But I encountered such error when running.

Fatal error in PMPI_Wait: Message truncated, error stack:
PMPI_Wait(183)....................: MPI_Wait(request=0x7fff40577328,
status=0x1) failed
MPIR_Wait_impl(77)................:
MPIDI_CH3U_Receive_data_found(129): Message from rank 2 and tag 0
truncated; 1440 bytes received but buffer size is 0
Fatal error in PMPI_Wait: Message truncated, error stack:
PMPI_Wait(183)....................: MPI_Wait(request=0x7fff1297d4a8,
status=0x1) failed
MPIR_Wait_impl(77)................:
MPIDI_CH3U_Receive_data_found(129): Message from rank 0 and tag 0
truncated; 2400 bytes received but buffer size is 0
Fatal error in PMPI_Wait: Message truncated, error stack:
PMPI_Wait(183)....................: MPI_Wait(request=0x7fffdde581a8,
status=0x1) failed
MPIR_Wait_impl(77)................:
MPIDI_CH3U_Receive_data_found(129): Message from rank 3 and tag 0
truncated; 864 bytes received but buffer size is 0
Fatal error in PMPI_Wait: Message truncated, error stack:
PMPI_Wait(183)....................: MPI_Wait(request=0x7fffcc8348a8,
status=0x1) failed
MPIR_Wait_impl(77)................:
MPIDI_CH3U_Receive_data_found(129): Message from rank 1 and tag 0
truncated; 1440 bytes received but buffer size is 0

The program stops due to comm->forward_comm_fix. The code for packing and
unpacking are the following:

int FixPolar::pack_forward_comm(int n, int *ilist, double *buf, int
pbc_flag, int *pbc)
{
  int m = 0;
  switch (packflag) {
    case POLMU:
      for (int i = 0; i < n; i++) {
        int iatom = ilist[i];
        int piatom = apolflag[iatom];
        if ( piatom != -1) {
          buf[m++] = Min.x[piatom*3];
          buf[m++] = Min.x[piatom*3+1];
          buf[m++] = Min.x[piatom*3+2];
        }
      }
      break;
  }
  return m;
}

void FixPolar::unpack_forward_comm(int n, int first, double *buf)
{
  int m = 0;
  int last = first+n;
  switch (packflag) {
    case POLMU:
      for (int i = first; i < last; i++) {
        int piatom = apolflag[i];
        if (piatom != -1) {
          Min.x[piatom*3] = buf[m++];
          Min.x[piatom*3+1] = buf[m++];
          Min.x[piatom*3+2] = buf[m++];
        }
      }
      break;
  }
}

Is there anyone who has some idea? I search on google and some one says it
is because of the insufficient buf or too much data to send. I don't think
it's such kind of error.

the message from the MPI library is clear and it is definitely because
you are not providing a suitable buffer size to receive the data.
however, with a program as complex as LAMMPS in its communication
patterns it is not always easy to pinpoint the location of a problem.
in part this is due to the fact that LAMMPS doesn't use message tags
to identify the kind of communication and thus any mismatch can lead
to all kinds of problems. it also happens when people misread or
misuse the code or use incompatible or incorrect computations that
collide with what LAMMPS is doing. thus this is not straightforward to
debug with only a small fragment of the code available and no simple
way to track this down.

what is most worrisome is the fact, that you get errors on MPI_Wait(),
but Comm::comm_forward_fix() does not use MPI_Wait(), only
Comm::comm_forward().
it is also troubling that you seem to be expecting a buffer size of 0
consistently. that could have multiple causes, the most trivial ones
would be that you did not adjust Fix::comm_forward to the necessary
size or didn't provide the current buffer size when calling the
forward communication. another reason could be that your "packflag"
variable is not consistently set across MPI ranks.

there are plenty of other possible reasons (e.g. mismatched
communication calls from within loops or if/then/else branches,
collisions between explicit MPI calls and LAMMPS communication
routines, lingering unreceived communications).

axel.

I find I forget to set fix::comm_forward. Thank you so much!