PPPM Cant Map Part of The Skin Region to Processors' Grid

Yuan_Shen · June 22, 2013, 1:40am

Hi, Steve and all!

We are shocking a crystal composed of charged atoms using the fix msst command(lastest lammps). Coulomb interactions are evaluated with the pppm method. Along the shock direction(z), the crystal is compressed by a factor of 2. When, or even after, the volume changes, we get an error threw out by pppm.cpp(line 1895) saying that "Out of range atoms - cannot compute PPPM". We have tried updating the neighbor list every step and making the time_step smaller, the same error just keeps coming out. Based on the following information, we guess that PPPM wont map all the skin region to processors' grid points when the box size is changed dramatically.

    1. We set processor grid to be 4*4*1. So processors dont need to communicate for information about z-position and there is only one subbox in the shock direction.
           boxlo[2]=sublo[2]=6.93(was 0.0 before the shock)
           boxhi[2]=subhi[2]=20.43(was 27.36 before the shock) with center unchanged and volume halved.

2. skin=2.0 as the default value in metal units, so dist[2]=0.5skin=1.0(TIP4P disabled, qdist=0). Thus, pppm should be about to map everything from sublo-dist to subhi+dist, [5.93,21.43], to procgrid. (pppm.cpp fun partical_map line 1879 to line 1891).

3. The atoms are moving outside the box frequently. But when its getting closer to the edge of the skin region, pppm throws error complaining about it. A typical z position we observe is x[i][2]=z=6.19, still in the skin region.

4. When we have more than 1 processors in the z direction, say 4*2*2, jobs are more easily killed by the same error.

And Here is my guess. Taking a look at the full expression of nz (pppm.cpp line 1881)

nz= <int>[(z-boxlo[2])*nz_pppm/zprd+shift]-OFFSET

When the crystal is compressed in z direction and zprd halved, shift(=OFFSET for order=5) stays the same. <int> may not give an upper/lower bound integer as desired. This can make part of the skin region uncovered.

We get a simple but dirty solution now to fix the simulation.

1. Use a n*m*1 processor grid for shocking the z direction. Out of the subbox=Out of the box in z direction.
2. In fix_msst.cpp::remap(), we move each atom thats out of the box one period of the boxlength and enforce it back into the box.

   FixMSST::remap() begin line 697
        double **check_atom_position = atom->x;
          for(int i=0;i<n;i++){
                  if (check_atom_position[i][2]<(domain->boxlo[2]))
                  check_atom_position[i][2] += domain->boxhi[2]-domain->boxlo[2];
                  if (check_atom_position[i][2]>(domain->boxhi[2]))
                  check_atom_position[i][2] -= domain->boxhi[2]-domain->boxlo[2];
        }

Do you guys have any idea about this problem? Are we safe to make this dirty change of the fix_msst code? Will there be any unphysical effect resulting from this change?

Best,

Yuan

akohlmey · June 22, 2013, 3:57am

Hi, Steve and all!

We are shocking a crystal composed of charged atoms using the fix msst command(lastest lammps). Coulomb interactions are evaluated with the pppm method. Along the shock direction(z), the crystal is compressed by a factor of 2.

how do you do this compression and how quickly?

When, or even after, the volume changes, we get an error threw out by
pppm.cpp(line 1895) saying that "Out of range atoms - cannot compute
PPPM". We have tried updating the neighbor list every step and making
the time_step smaller, the same error just keeps coming out. Based on
the following information, we guess that PPPM wont map all the skin
region to processors' grid points when the box size is changed
dramatically.

    1. We set processor grid to be 4*4*1. So processors dont need to communicate for information about z-position and there is only one subbox in the shock direction.
           boxlo[2]=sublo[2]=6.93(was 0.0 before the shock)
           boxhi[2]=subhi[2]=20.43(was 27.36 before the shock) with center unchanged and volume halved.

    2. skin=2.0 as the default value in metal units, so dist[2]=0.5skin=1.0(TIP4P disabled, qdist=0). Thus, pppm should be about to map everything from sublo-dist to subhi+dist, [5.93,21.43], to procgrid. (pppm.cpp fun partical_map line 1879 to line 1891).

    3. The atoms are moving outside the box frequently. But when its getting closer to the edge of the skin region, pppm throws error complaining about it. A typical z position we observe is x[i][2]=z=6.19, still in the skin region.

    4. When we have more than 1 processors in the z direction, say 4*2*2, jobs are more easily killed by the same error.

And Here is my guess. Taking a look at the full expression of nz (pppm.cpp line 1881)

    nz= <int>[(z-boxlo[2])*nz_pppm/zprd+shift]-OFFSET

When the crystal is compressed in z direction and zprd halved, shift(=OFFSET for order=5) stays the same. <int> may not give an upper/lower bound integer as desired. This can make part of the skin region uncovered.

We get a simple but dirty solution now to fix the simulation.

   1. Use a n*m*1 processor grid for shocking the z direction. Out of the subbox=Out of the box in z direction.
   2. In fix_msst.cpp::remap(), we move each atom thats out of the box one period of the boxlength and enforce it back into the box.

   FixMSST::remap() begin line 697
        double **check_atom_position = atom->x;
          for(int i=0;i<n;i++){
                  if (check_atom_position[i][2]<(domain->boxlo[2]))
                  check_atom_position[i][2] += domain->boxhi[2]-domain->boxlo[2];
                  if (check_atom_position[i][2]>(domain->boxhi[2]))
                  check_atom_position[i][2] -= domain->boxhi[2]-domain->boxlo[2];
        }

Do you guys have any idea about this problem? Are we safe to make this dirty change of the fix_msst code? Will there be any unphysical effect resulting from this change?

not sure about this, but i'd rather first try with setting
"communicate cutoff" to a larger value. this options was added for
cases like this, where more atoms need to be communicated between
domains than what is just required by the cutoff and neighborlist
skin.

axel.

Yuan_Shen · June 22, 2013, 6:22am

Hi, Axel!

thanks for your reply. I tried the communicate command with a cutoff of 5.0 (> 2*skin) and still got the same error(out of range in the z direction) at some time step. I dont think its related to the communication between processors cuz we have tried the 4*4*1 procgrid where there is only 1 processor controling all the z positions.

Best,

Yuan

akohlmey · June 22, 2013, 6:38am

Hi, Axel!

thanks for your reply. I tried the communicate command with a cutoff of 5.0 (> 2*skin) and still got the same error(out of range in the z direction) at some time step. I dont think its related to the communication between processors cuz we have tried the 4*4*1 procgrid where there is only 1 processor controling all the z positions.

the communicate cutoff need not be related to the skin distance, it is
only initialized according to it by default, because that is what is
needed for a normal setup. i have done (rather unusual) simulations
where the communicated cutoff needed to be > 100*skin. so please try a
much larger value to make sure.

axel.

sjplimp · June 22, 2013, 11:53am

Based on the following information, we guess that PPPM wont map all the skin region to processors’ grid points when
the box size is changed dramatically.

That’s correct. It won’t do that on-the-fly during a run. Which
can possibly be bad for the issues you are having but also
for the basic accuracy, since the grid size is computed based
on the initial box dimensions.

The “solution” to this is easy. Break your run into pieces
where the box size doesn’t change more than say 10%.
Each run will re-configure the mesh for the current box size.

Steve

LC_Liu · June 25, 2013, 7:40am

Hi, Steve,

Aligning this question, I am encountering something similar. I am writing my own command which does occasional box size change. I am following the logic in “run” command.
Meaning, I break my process into pieces of runs and change box size at the end of each “sub” runs. The box change process involves no time integration. Atom locations are remapped per new box size. The following codes are then added after the box change to account for possible atom migration/location change:

update->integrate->cleanup();
lmp->init();
update->integrate->setup_minimal(1);

Since the box is changed, I think re-configure the grid (from scratch) is better than just migrating the atoms across the boarders.
I used setup_minimal(1) instead of setup() is because I assume the atoms are properly setup and only reneighboring and rebuilding kspace is needed.

However,

I still occasionally encounter atom lost…I think this is due to the intrinsic system problem (such as too large step size). But is there something wrong with my logic?
A bunch of screen output is added when initializing kspace. Is there a way to eliminating the output?

Any comment is greatly appreciated. Thank you so much for your time and effort.

LC Liu

sjplimp · June 25, 2013, 11:56am

I dont know. I think you will have to debug and
identify a specific atom that is lost, then figure
out what it’s coords are at the point it is lost,
and figure out why it occurred. It should only
happen if the atom moved too far (more than
one processor distance away) before
reneighboring which usually indicated
bad dynamics.

Steve

LC_Liu · June 26, 2013, 8:06am

Hi, Steve,

Thank you so much for the comments. I learned something from your information.
However it is difficult to debug since the process is basically not reproducible. (I imposed a event concerning a random number, making the whole process not deterministic)
Also, this is quite a rare event (happens once several nano seconds), so right now I just restart the process when atoms are lost.

Thank you again for your time.

LC Liu