Having (for the moment) worked around my lammps/numpy array issues, I have a perhaps subtle question about symbol resolution.
I have a python script that uses mpi4py, and each MPI process is supposed to start a _serial_ lammps process. I’ve compiled the lammps shared library in serial (using the default Makefile.serial), with mpi-stubs, and indeed when I do “nm” on the .so I see the MPI routines with ’T’. When I run this python script truly in serial, without mpi4py imported, it seems to do what I expect - the lammps runs in serial, and indeed invokes the lammps MPI stubs. However, when I import mpi4py (before or after I import lammps), I get an MPI error when I try to start the LAMMPS process:
[tin:19579] *** An error occurred in MPI_Comm_rank
[tin:19579] *** reported by process [3967680513,0]
[tin:19579] *** on communicator MPI_COMM_WORLD
[tin:19579] *** MPI_ERR_COMM: invalid communicator
[tin:19579] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[tin:19579] *** and potentially your MPI job)
and from what I can tell, the mpi-stubs routine isn’t being called (although I’m not 100% sure of this because I just checked it with a printf, and output may not be getting flushed when the program dies).
Here’s a trivial example that reproduces this problem:
from lammps import lammps
from mpi4py import MPI
print "creating lammps object with mpi4py imported"
lmp = lammps()
print “done”
It gives the error above. If I comment out the line “from mpi4py import MPI”, it works fine.
I can think of a couple of possible ways to fix this, but don’t know how to do either of them:
1. Is there any way to ensure that in this somewhat complex, dynamically linked construct, the lammps routines end up invoking the mpi stubs routines rather than the real MPI routines that mpi4py loads (without me having to patch the LAMMPS source)?
2. Even more general, given that there’s clearly interest in having lammps accessible a a python module, is there a way to pass an MPI communicator to the lammps python startup process, so it’ll operate on that instead of mpi_comm_world? This way an MPI python script can split MPI_COMM_WORLD and have each LAMMPS process operate in its own subset of the MPI tasks.
Does anyone have any suggestions in the short term? Is there any interest by any LAMMPS developers in implementing #2, which is (I think) the more general and long-term useful solution? I’d be happy to contribute to a discussion of that possibility, and in principle I’d be happy to contribute a patch, but unfortunately for our internal bureaucratic reasons it would be quite a slow process if I had to actually write any code.
thanks,
Noam