ndx2group gives error in parallel

Hello all,

I use ndx2group in my simulation. It runs okey in serial. However, when I do:
mpirun -np 4 ~/software/lammps-7Aug19/src/lmp_mpi <runfile

It gives the errors:
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1584)…: MPI_Bcast(buf=0x555ab8707460, count=4, MPI_CHAR, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1436)…:
MPIR_Bcast(1460)…:
MPIR_Bcast_intra(1241)…:
MPIR_SMP_Bcast(1085)…:
MPIR_Bcast_binomial(250): message sizes do not match across processes in the collective routine: Received 3 but expected 4
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1584)…: MPI_Bcast(buf=0x55c830c9e420, count=4, MPI_CHAR, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1436)…:
MPIR_Bcast(1460)…:
MPIR_Bcast_intra(1241)…:
MPIR_SMP_Bcast(1085)…:
MPIR_Bcast_binomial(250): message sizes do not match across processes in the collective routine: Received 3 but expected 4
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1584)…: MPI_Bcast(buf=0x561acd875460, count=4, MPI_CHAR, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1436)…:
MPIR_Bcast(1460)…:
MPIR_Bcast_intra(1241)…:
MPIR_SMP_Bcast(1085)…:
MPIR_Bcast_binomial(310): Failure during collective

Please find attached my input ,data and index files.

Thanks,
Jamal

data.dat (468 KB)

ndx (2.07 KB)

runfile (309 Bytes)

hi,

thanks for reporting and for providing a simple example to quickly reproduce the issue.
looks like there is an off-by-one error in the command. the following change should correct it.

axel.

diff --git a/src/USER-COLVARS/ndx_group.cpp b/src/USER-COLVARS/ndx_group.cpp
index a1369df2f…55c657b7f 100644
— a/src/USER-COLVARS/ndx_group.cpp
+++ b/src/USER-COLVARS/ndx_group.cpp
@@ -126,7 +126,7 @@ void Ndx2Group::command(int narg, char **arg)
}
name = find_section(fp,NULL);
if (name != NULL) {

  • len=strlen(name);
  • len=strlen(name)+1;

// skip over group “all”, which is called “System” in gromacs
if (strcmp(name,“System”) == 0) continue;

Hi Axel.

Thanks for your prompt response.

I changed the line “len=strlen(name)” to “len=strlen(name)+1” (line 129 of ndx_group.cpp in my version of lammps) and recompiled. But I’m getting the same error. Maybe I misunderstood your answer. I’d appreciate your help.

Best,
Jamal

please try this more complex set of changes.

axel.

diff --git a/src/USER-COLVARS/ndx_group.cpp b/src/USER-COLVARS/ndx_group.cpp
index a1369df2f…375620821 100644
— a/src/USER-COLVARS/ndx_group.cpp
+++ b/src/USER-COLVARS/ndx_group.cpp
@@ -126,7 +126,7 @@ void Ndx2Group::command(int narg, char **arg)
}
name = find_section(fp,NULL);
if (name != NULL) {

  • len=strlen(name);
  • len=strlen(name)+1;

// skip over group “all”, which is called “System” in gromacs
if (strcmp(name,“System”) == 0) continue;
@@ -152,8 +152,8 @@ void Ndx2Group::command(int narg, char **arg)
MPI_Bcast(&len,1,MPI_INT,0,world);
if (len > 0) {
delete[] name;

  • name = new char[len+1];
  • MPI_Bcast(name,len+1,MPI_CHAR,0,world);
  • name = new char[len];
  • MPI_Bcast(name,len,MPI_CHAR,0,world);

MPI_Bcast(&num,1,MPI_LMP_BIGINT,0,world);
tags = (tagint )malloc(sizeof(tagint)(num ? num : 1));
@@ -174,7 +174,7 @@ void Ndx2Group::command(int narg, char **arg)
if (name != NULL) delete[] name;
rewind(fp);
name = find_section(fp,arg[idx]);

  • if (name != NULL) len=strlen(name);
  • if (name != NULL) len=strlen(name)+1;

if (screen)
fprintf(screen," %s group ‘%s’\n",
@@ -185,7 +185,7 @@ void Ndx2Group::command(int narg, char **arg)

MPI_Bcast(&len,1,MPI_INT,0,world);
if (len > 0) {

  • MPI_Bcast(name,len+1,MPI_CHAR,0,world);
  • MPI_Bcast(name,len,MPI_CHAR,0,world);
    // read tags for atoms in group and broadcast
    num = 0;
    tags = read_section(fp,num);
    @@ -199,8 +199,8 @@ void Ndx2Group::command(int narg, char **arg)
    MPI_Bcast(&len,1,MPI_INT,0,world);
    if (len > 0) {
    delete[] name;
  • name = new char[len+1];
  • MPI_Bcast(name,len+1,MPI_CHAR,0,world);
  • name = new char[len];
  • MPI_Bcast(name,len,MPI_CHAR,0,world);

MPI_Bcast(&num,1,MPI_LMP_BIGINT,0,world);
tags = (tagint )malloc(sizeof(tagint)(num ? num : 1));

mpi-fix.diff.gz (858 Bytes)