subtle issue with change_box and python interface

Noam_Bernstein · March 11, 2016, 1:10am

Hi all - I’m doing something a little out of the usual lammps paradigm, and I’m hoping someone can help me fix my problem in a clean way.

Basically, what I want to do is this:

Use the python interface to do an MD trajectory near the stability edge, so occasionally atoms explode and end up at positions that look like 1e20
Correct for these explosions, so the next evaluation will be with a sensible configuration

The problem is that at the end of step one, the lammps arrays contain the crazy position values. Then, when I do step 2, I try to do the following two things:
a. Send a “change_box …” command from the python interface
b. Send the new positions with a scatter() python call

The problem is that the change_box command calls ChangeBox::command(), which, near the end, calls Domain::remap(). Note that this is not the change_box command line argument ‘remap’ (which I don’t need or want), and it always happens. The loop inside Domain::remap() never ends, either because I’m not patient enough to wait for the position to be shifted from 1e+22 to 0, or because it’ll never happen because of underflows.

I thought this would be easy to fix, just switch the order of b and a. Change box (without remap) isn’t supposed to move atoms. However, in this (weird, I admit) case it does. Starting in line 200, there’s a sequence of calls, one of which (domain->reset_box(), I think), trigger a call do Domain::remap(). This remap() maps atoms (which were set in step a, and correspond to the box that will be set later in ChangeBox::command()), using the previous box. Then, when the actual new pbcs are set later in ChangeBox::command() (the calls to set_global_box), the atom has already been moved to the wrong place.

The only solution I can think for this is to go to my original order, but first test for crazy positions (perhaps non-trivial cost to check), and reset them (with a possibly expensive scatter call).

Does anyone have any suggestions for a cleaner way of dealing with my problems? I have new, sensible positions and a box. How do I change lammps’s state from the old one with broken positions to the new one?

thanks,
Noam

akohlmey · March 11, 2016, 2:22am

Hi all - I’m doing something a little out of the usual lammps paradigm, and
I’m hoping someone can help me fix my problem in a clean way.

Basically, what I want to do is this:
1. Use the python interface to do an MD trajectory near the stability edge,
so occasionally atoms explode and end up at positions that look like 1e20
2. Correct for these explosions, so the next evaluation will be with a
sensible configuration

The problem is that at the end of step one, the lammps arrays contain the
crazy position values. Then, when I do step 2, I try to do the following
two things:
a. Send a “change_box …” command from the python interface
b. Send the new positions with a scatter() python call

The problem is that the change_box command calls ChangeBox::command(),
which, near the end, calls Domain::remap(). Note that this is not the
change_box command line argument ‘remap' (which I don’t need or want), and
it always happens. The loop inside Domain::remap() never ends, either
because I’m not patient enough to wait for the position to be shifted from
1e+22 to 0, or because it’ll never happen because of underflows.

I thought this would be easy to fix, just switch the order of b and a.
Change box (without remap) isn’t supposed to move atoms. However, in this
(weird, I admit) case it does. Starting in line 200, there’s a sequence of
calls, one of which (domain->reset_box(), I think), trigger a call do
Domain::remap(). This remap() maps atoms (which were set in step a, and
correspond to the box that _will_ be set later in ChangeBox::command()),
using the previous box. Then, when the actual new pbcs are set later in
ChangeBox::command() (the calls to set_global_box), the atom has already
been moved to the wrong place.

The only solution I can think for this is to go to my original order, but
first test for crazy positions (perhaps non-trivial cost to check), and
reset them (with a possibly expensive scatter call).

Does anyone have any suggestions for a cleaner way of dealing with my
problems? I have new, sensible positions and a box. How do I change
lammps’s state from the old one with broken positions to the new one?

the first thing that comes to my mind, would be that you indeed first
"sanitize" the broken positions from within lammps via an atom style
variable and the set command. that could be fairly efficient, since it
is a per atom operation that you only need to do 3 times (x, y, and
z).

axel.

Noam_Bernstein · March 11, 2016, 12:59pm

Thanks - so by this you mean something along the lines of
set atom * x 0.0 y 0.0 z 0.0

Do you have a sense of whether this will be faster, slower, or the same as just doing a python scatter of a bunch of zeros? I guess it should at least be faster because it would mean less MPI communication.

Noam

sjplimp · March 11, 2016, 3:36pm

The term “remap” is overloaded here.

The remap option in the change box command

refers to affine-style deformations. I.e. if the box

warps, warp the atoms as well.

The domain->remap that happens at the end

of whatever operation change box does is required

to insure all the atoms are still inside the changed

box, e.g. due to changes in where a periodic boundary was.

If your atom coords are bogus when you invoke

change box, I think the solution is to fix them before

calling it.

Steve

sjplimp · March 11, 2016, 3:38pm

Thru the Python interface you can get a pointer (on each proc)

to the coords of atoms (on that proc). This is different

than the lib interface gather_atoms() method. See

the extract_atoms() method. That requires no comm,

and from Python you can directly manipulate the

atom coords.

Steve

Noam_Bernstein · March 11, 2016, 3:53pm

Thanks - that sounds like the quickest way in terms of execution time (although I also like Alex’s for being the easiest), as long as I’m prepared to handle the parallelism.

I just realized that it’s not completely clear to me how to handle parallelism from the python interface. Not doing it right now (well, I’m running the parallel version of lammps, but always on a single MPI process), but I’ll need to understand this in the future. The documentation is clear on the operational side of setting things up for parallel, but I haven’t been able to find as much detail on what you can/can’t do when running. I think I can boil it down to three simple questions, at least to start:

Given a LAMMPSlib object created with a particular communicator:

Is it true that each mpi process in the communicator can call extract_atom() independently of the others? If not, what are the restrictions?
Which mpi task can call gather/scatter_atoms()? Any? Only the one with rank=0? All must do it together?
Similarly, which mpi task(s) can call command()?

Noam

sjplimp · March 12, 2016, 3:17pm

The only 2 Py+LAMMPS models I think are possible are as follows:

a) run Python in serial, invoke LAMMPS in serial thru library
b) run Python in parallel on P processes with an MPI for Python,
invoke LAMMPS in parallel thru lib, on the same P proc

When you say you are running Python in serial, and LAMMPS
in parallel, I assume you mean using something like OpenMP
options (e.g. USER-OMP), not MPI parallel. I don’t see
how the latter could work. Of course if you weren’t using
the lib interface, you could write a Python script that
issued commands like “mpirun -np P lmp_g++ < in.foo”
for any P you liked, or subprocess it to do that multiple times.
But then you couldn’t use these lib interface functions.

Now, for your Qs:

extract_atom() simply returns a local ptr to e.g. atom coords
so for (a) it is all the atoms in the system
for (b) it only the atoms on that proc
either way, no MPI is involved
if running as (b), it is your Python code which determines
if all proc call extract_atom() together or not
gather_atoms() does an MPI_Allreduce, so it’s really
an “all gather”, so for (b) it must be called by all procs
and all procs will return the full set of all atoms
is the same as (2), all procs must call it (with the same string)
it doesn’t do an Allreduce per se, but the command it
invokes might (internal to LAMMPS)

Hope that help. This info should be added to the Section python
doc page. The simple way to think about it is that Python and the lib
interface have nothing to do with it. You’re just writing a parallel
program with your script. So (1,2,3) work the way you would expect
if you were writing those functions yourself.

Steve

Noam_Bernstein · March 12, 2016, 8:55pm

The only 2 Py+LAMMPS models I think are possible are as follows:

a) run Python in serial, invoke LAMMPS in serial thru library
b) run Python in parallel on P processes with an MPI for Python,
invoke LAMMPS in parallel thru lib, on the same P proc

When you say you are running Python in serial, and LAMMPS
in parallel, I assume you mean using something like OpenMP
options (e.g. USER-OMP), not MPI parallel. I don’t see
how the latter could work. Of course if you weren’t using
the lib interface, you could write a Python script that
issued commands like “mpirun -np P lmp_g++ < in.foo”
for any P you liked, or subprocess it to do that multiple times.
But then you couldn’t use these lib interface functions.

I must have been unclear - my python script is running in parallel (using mpi4py), and right now each python mpi process starts an mpi aware but one process only lammps python interface (using MPI_COMM_SELF). I can’t use serially compiled lammps, because it defines routines that have names identical to mpi routines, which confuses the dynamic linker. I’m looking to the future, where each python mpi process (which is relatively lightweight) wants to run a parallel lammps trajectory using a few (> 1, but probably <= 16) cores.

Now, for your Qs:

extract_atom() simply returns a local ptr to e.g. atom coords
so for (a) it is all the atoms in the system
for (b) it only the atoms on that proc
either way, no MPI is involved
if running as (b), it is your Python code which determines
if all proc call extract_atom() together or not

gather_atoms() does an MPI_Allreduce, so it’s really
an “all gather”, so for (b) it must be called by all procs
and all procs will return the full set of all atoms

OK - that’s exactly what I wanted to know. It’s an allgather, not a gather (from the mpi point of view).

is the same as (2), all procs must call it (with the same string)
it doesn’t do an Allreduce per se, but the command it
invokes might (internal to LAMMPS)

THanks.

Hope that help. This info should be added to the Section python
doc page.

That’d be helpful.

The simple way to think about it is that Python and the lib
interface have nothing to do with it. You’re just writing a parallel
program with your script. So (1,2,3) work the way you would expect
if you were writing those functions yourself.

It’s good that you clarified, since I don’t think I find that as intuitively obvious as you think I should. Especially when it comes to things like the “command” interface, it wasn’t at all obvious to me whether lammps expects that command to come only to rank==0, which then communicates to the other lammps mpi ranks, or directly to all mpi processes.

I think the way I like to think about it is that I’m writing an extension of lammps, which inherits a view of lammps’s parallelization strategy. I’m not communicating with a single “master” lammps process which then deals with its parallelization in a hidden manner.

Noam

akohlmey · March 12, 2016, 9:11pm

The only 2 Py+LAMMPS models I think are possible are as follows:

a) run Python in serial, invoke LAMMPS in serial thru library
b) run Python in parallel on P processes with an MPI for Python,
invoke LAMMPS in parallel thru lib, on the same P proc

When you say you are running Python in serial, and LAMMPS
in parallel, I assume you mean using something like OpenMP
options (e.g. USER-OMP), not MPI parallel. I don't see
how the latter could work. Of course if you weren't using
the lib interface, you could write a Python script that
issued commands like "mpirun -np P lmp_g++ < in.foo"
for any P you liked, or subprocess it to do that multiple times.
But then you couldn't use these lib interface functions.

I must have been unclear - my python script is running in parallel (using
mpi4py), and right now each python mpi process starts an mpi aware but one
process only lammps python interface (using MPI_COMM_SELF). I can’t use
serially compiled lammps, because it defines routines that have names
identical to mpi routines, which confuses the dynamic linker. I’m looking
to the future, where each python mpi process (which is relatively
lightweight) wants to run a parallel lammps trajectory using a few (> 1, but
probably <= 16) cores.

which means, that you are neither case a) or case b), but case c):
run Python in paralle on P*N processes, create P partitions of N
processes from it (using MPI_Comm_split()) and then run a _separate_
LAMMPS instance in each partition. you still have to initialize LAMMPS
on all processes, but then thinks can get iffy...

Now, for your Qs:

1) extract_atom() simply returns a local ptr to e.g. atom coords
    so for (a) it is all the atoms in the system
    for (b) it only the atoms on that proc
    either way, no MPI is involved
    if running as (b), it is your Python code which determines
    if all proc call extract_atom() together or not
2) gather_atoms() does an MPI_Allreduce, so it's really
    an "all gather", so for (b) it must be called by all procs
    and all procs will return the full set of all atoms

OK - that’s exactly what I wanted to know. It’s an allgather, not a gather
(from the mpi point of view).

which is no difference in your current case of N=1. it would only
matter in the case of N > 1. also, the term "all processes" is a bit
misleading here. it is "all processes _on the same communicator_".
with MPI_COMM_SELF each of your LAMMPS instances is operating on a
different one.

axel.

Noam_Bernstein · March 12, 2016, 9:19pm

Yes, that’s exactly what I’m aiming at. I was sure I had to initialize them on all processes (since that’s how MPI works, basically), but I wasn’t sure how to communicate with them (i.e. which python interface routines to call on which processes) after that.

Adding a trivial example to the documentation, of a simple script that does a parallel lammps run from the python interface, might be nice.

Right.

Yes, “on the same communicator" was implied. I’m assuming that once I split the communicator and pass subcommunicators including mutually exclusive subsets of tasks to start a set of lammps python objects, each of these lammps runs is completely independent of the others.

Noam