Steve Plimpton wrote:
There has been lots of back-and-forth between Ricardo and Axel
on this thread - I juts have a few general comments.
maybe I'm wrong, but as far as I know, MPI doesn't terminate the user
program (and never happened to me at least)
The default MPI behavior is to abort on an error. Try
int me; MPI_Comm_rank(-2,&me);
Not as far as I know: the code you suggested me, doesn't abort, just segfaults (mpich-shmem), which is what I expected it to do, and a different matter from my argument.
There's nothing strange if a coding error (like the example you pointed above) results in a segmentation fault: the point being discussed (or at least I think so) is that a library should never cause the termination of a code on purpose.
The problem with the LAMMPS interface (and please, keep in mind that I'm not pushing for the inclusion of my patch again, this is, from my point of view, more like an academical discussion than anything else) is that on errors it will cause the termination of the whole program that's linking it.
If someone includes a fix in LAMMPS that dereferences NULL and causes a segmentation fault, no one will ever complain if the code crashed, not with you at least (in fact, you can't avoid it), but if a runtime error happens, as a library, LAMMPS should make the error transparent to the user and guarantee that at least the LAMMPS instance could be correctly finalized (no segfaults or leaks of sort), this at least in principle.
Now, I will not repeat the arguments already exposed by Axel (if not else, because they prove me wrong ) and I agree that LAMMPS should probably keep going on the tracks it is already, especially if a change of direction would come for the sake of very dubious benefits (as Axel correctly pointed out).
I wouldn't ever barter a better library interface with the intuitive and clean "plug in" interface LAMMPS it's featuring right now, but as a matter of principle, a library to be called that in a proper way, should have some well defined characteristics, one of them is not to change the state of the calling program in other ways than the ones exposed by the caller itself (usually with arguments passing), which includes killing the program altogether.
You can change this to giving an error return via
MPI_Errhandler_set(). However, some errors leave
MPI in a bad state, where all future calls just return
an error. So you are effectively dead. Moreover, there
are errors that can occur inside MPI which can not be
recovered from even to give an error return, they just abort,
e.g. dropping a message, due to memory issues.
yes, but again they are examples that come as a result of the misuse of the MPI interface itself (e.g. passing a wrong shaped memory array, or a NULL pointer) or of some catastrophic system wide error, For normal, run-time errors, the "err" variable value can be retrieved and, although it gives no other choice to the user but to finalize the MPI interface, it still doesn't mess up the user's memory in the process, at least not intentionally.
I agree with Axel that there are many LAMMPS errors which are
detected too deep to do anything useful to recover. Even syntax
errors in an input command may not be detected until new classes
and memory have been allocated, and the code is not written
to allow those conditions to be recovered from, e.g. to avoid losing
memory, or to unset things that were partially setup before the
error was detected.
I do agree, however, it might still be nice to let the
driver program trap these errors, via the exception mechanism
you propose. You should just realize the only safe thing to
do is destruct LAMMPS, and reinstantiate it. Even that can't
be guaranteed to be "safe" I don't think. At a minimum memory
could be lost - possibly the destruct() could crash.
you are right, and I have to admit that I thought about the "scenarios" I pointed you in the last mail, and I concluded that some of them where quite bogus.
Both keeping an unstable version of LAMMPS that could be deeply corrupted by an error, or risking a memory leak (if not a segfault altogether) by finalizing the corrupted instance isn't the kind of risk I would take in one of my code, so why should some one else?
If you want to interact with a running simulation, then a fix
is the way to do it - not the library interface. While the
I did 2 of them already: btw, the interface to include them is really *great*
library interface is barebones, it is really just meant to illustrate
what you can do. You can add any function you want to
library.cpp. All of the LAMMPS classes and data structures
are essentially exposed there, so you can poke and peek at
whatever you wish. The extract() function Axel mentioned is
one example of something that gives you that ability, as are
get_coords() and put_coords(). But your imagination is the only
limit.
I could add your exception wrapper to error.cpp and the lib interface
if you assure me of one thing I am unclear on. Since there
is no other exception/throw/catch logic in LAMMPS, will adding
this degrade the performace of normal LAMMPS at all, e.g. by
adding compiled code/logic to other parts of LAMMPS? Or does
it just affect the performance of the Error class and the library
interface itself (whichever routines catch the error). If the latter
is the case, then I don't mind adding it, so the caller can
receive an error return instead of an abort.
The latter should be the case, but I admit my ignorance about the topic. I have searched the internet a bit and it seems like compiler dependent: it is possible to produce zero cost exception handling code (and g++ does that) but some compilers might add a very slight cost, at least in the routines catching the exceptions.
Anyway, for as strange as it might seem, I would at this point advise you to drop my patch anyway, since after the discussion with Axel it seems it would be of very dubious benefit, and as much as I'm sure it would not degrade anything, it just doesn't worth the trouble...
I just hope that the discussion was worth something by itself (I do think so, at least) and that I didn't wasted your time.
Also, if you have a Python wrapper for LAMMPS, that
lets you issue commands to LAMMPS from Python, I'd like
to see it, and possibly add it to the distribution. I've known
that's possible thru the LAMMPS lib interface, but haven't done
it myself.
Ok: I'll clean it a bit in the next weeks and send it as soon as it will be not embarrassing (and the MPI support will be inserted again: I didn't need it so I removed it...)
For now, although it works, it's a mix of SWIG (which just takes care of all the module initialization and types creations... I'm quite lazy) and pure C-Python code. A poor thing.
Riccardo