[lammps-users] Problems with mpi running on a code

Hello everyone,

I’m trying to run a fix that I programmed myself. I tested it on a single processor and it worked properly. However, when I try to use 2 processors, I got this error:

[mpiexec@…2396…] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)

Does anyone knows what does this error means?

Regards,

Alexandre

It looks like one of your processors crashe. Likely
that you have a bug in the fix you wrote.

Steve

Thanks, Steve. I found where the bug is, but I have no idea how to solve it.

The problem is that I have a program that creates, for each atom, a string of forces, so that for each timestep, it accesses one of the elements of the string and adds this to the total force on the respective atom. After debugging the program, I found that during a time step, one of the atoms changes the region where it was created, so that another processor should take care of it’s dynamics (I verified this by means of printf ). However, I couldn’t make it’s corresponding string of forces to change between processors, and that gives the error.

Basically, the problem occurs on a ‘for’ that runs through all the local atoms of each processor. Is there some sort of function that verifies if an atom is in a region of a specific processor, so that I could update where the string should be applied?

Thank you very much,

Alexandre

2011/2/15 Steve Plimpton <[email protected]>

Thanks, Steve. I found where the bug is, but I have no idea how to solve
it.

The problem is that I have a program that creates, for each atom, a
string of forces, so that for each timestep, it accesses one of the elements
of the string and adds this to the total force on the respective atom. After
debugging the program, I found that during a time step, one of the atoms
changes the region where it was created, so that another processor should
take care of it's dynamics (I verified this by means of printf ). However, I
couldn't make it's corresponding string of forces to change between
processors, and that gives the error.

Basically, the problem occurs on a 'for' that runs through all the local
atoms of each processor. Is there some sort of function that verifies if an
atom is in a region of a specific processor, so that I could update where
the string should be applied?

alexandre,

the way to uniquely identify an atom across processor boundaries
is to look at its tag property.

also you should have a look at:

a fix has multiple *_comm methods that are called at
different times throughout the MD set up and loops.
you can use those to communicate per atom properties.
best to look at other fixes that do similar things.

cheers,
    axel.

More specifically, fixes can carry properties around with an atom
(which only makes sense if you need old information to stay
with the atom, not if you compute the info fresh every timestep).
Many fixes do this - see the pack_exchange, unpack_exchange
routines in any fix and how they use them.

Steve

Thanks for the replies. Now I’m trying to use the pack and unpack_exchange. I’ve added a few lines on the code, like these bellow

/* ----------------------------------------------------------------------
allocate atom-based array
------------------------------------------------------------------------- */

void FixQuantumBath::grow_arrays(int nmax)
{ //nmaxim = atom->nmax;
noisex = memory->grow_2d_double_array( noisex, nmax, noisesize,“fix_quantumbath:noisex”);
noisey = memory->grow_2d_double_array( noisey, nmax, noisesize,“fix_quantumbath:noisey”);
noisez = memory->grow_2d_double_array( noisez, nmax, noisesize,“fix_quantumbath:noisez”);
}

/* ----------------------------------------------------------------------
copy values within local atom-based arrays
------------------------------------------------------------------------- */

void FixQuantumBath::copy_arrays(int i, int j)
{
for (int t = 0; t < noisesize; t++) {
noisex[j][t] = noisex[i][t];
noisey[j][t] = noisey[i][t];
noisez[j][t] = noisez[i][t];
}
}

/* ----------------------------------------------------------------------
pack values in local atom-based arrays for exchange with another proc
------------------------------------------------------------------------- */

int FixQuantumBath::pack_exchange(int i, double *buf)
{
int m = 0;
for (int t = 0; t < noisesize; t++) {
buf[m++] = noisex[i][t];
buf[m++] = noisey[i][t];
buf[m++] = noisez[i][t];
}
return m;
}

/* ----------------------------------------------------------------------
unpack values in local atom-based arrays from exchange with another proc
------------------------------------------------------------------------- */

int FixQuantumBath::unpack_exchange(int nlocal, double *buf)
{
int m = 0;
for (int t = 0; t < noisesize; t++) {
noisex[nlocal][t] = buf[m++];
noisey[nlocal][t] = buf[m++];
noisez[nlocal][t] = buf[m++];
}
return m;
}

However, an error occurs on the “copy_arrays” function, when it tries to access the “noise” string (it tries to access unallocated place). Even though I allocate enough space for all the strings, which I didn’t wanted to, the error I had described on my second email occurs in the same way (problem with an atom changing between processors). Am I forgetting something?

Thanks,

Alexandre

2011/2/16 Steve Plimpton <sjplimp@…24…>

I don't know - I don't really debug other people's code.
There are many examples of fixes that implement
these routines - they also need calls like this in
their constructor to setup the callback:

  grow_arrays(atom->nmax);
  atom->add_callback(0);

You'll have to debug your own code, figure out
if the routines are gettting called as you expect,
where the memory error is occuring. Valgrind
is your friend.

Steve