linking LAMMPS with NLopt

Dear LAMMPS users,

I am trying to extend the available minimizers for LAMMPS by making an interface to the NLopt (LPGL-licensed) optimization library.
To get the basics working turned out to be not so difficult (e.g. LBFGS works on a single core) but I have some remaining issues that I hope someone can help me with.

NLopt works using an abstract optimizer which calls the “objective function”.
This objective function is a pointer to a function which returns the energy and sets the gradients if necessary.
The nice thing about NLopt is that the optimizer can be changed simply by changing one number and it adds about 10 local gradient-based optimizers, about 10 local gradient-free optimizers and some 20 other global optimizers.

My source and other necessary changes (wrt stable lammps version of 02 june 2012) are at the end of mail

The main problem I have is with MPI.
I was assuming the function “iterate” is only called on the parent node but it actually runs on all nodes.
This is a problem because the objective function should only be evaluated only on one MPI node (the parent), however the computation of energies and forces should be done in parallel.
If I simply add something like
if(comm->me==0) { … run optimizer … }
LAMMPS will hang on the first global communication, e.g. MPI_SUM
I am rather new to MPI, could someone tell me the correct way to handle this?

Some other smaller things,

  • I could add constraints (like a minimum atom distance) for a global optimizer to make sense… but I’m not sure if anyone would ever use this
  • the following line seems a bit redundant because I am already passing a reference to xvec, but depending on the algorithm this is (e.g. NLOPT_LD_SLSQP) or is not (e.g. NLOPT_LD_LBFGS) needed
    for(int i=0;i<n;i++) xvec[i] = x[i]; // make sure x is copied
  • This may be more an NLopt issue but I needed to some ‘eps’ prefactor to scale the forces to make stepsizes reasonable.
    Convergence is quite indepedendent of ‘eps’ but this is just to assure the system doesn’t explode.
    I cannot set a ‘dmax’ like in LAMMPS because this is simply not how the algorithms work.
    Do you think the fact that is necessary indicates a bug?
    If not, is there a more elegant solution? I’m a bit worried about ‘unit’ dependence etc.

Any other suggestions to clean up or improve the code are welcome of course.

Thanks!

Kind regards,
Jaap

Well this sounds interesting to hook LAMMPS to an extermal
package of minimizers with rich options, including constraints.

The main problem I have is with MPI.
I was assuming the function "iterate" is only called on the parent node but
it actually runs on all nodes.
This is a problem because the objective function should only be evaluated
only on one MPI node (the parent), however the computation of energies and
forces should be done in parallel.
If I simply add something like
if(comm->me==0) { ... run optimizer ... }
LAMMPS will hang on the first global communication, e.g. MPI_SUM
I am rather new to MPI, could someone tell me the correct way to handle
this?

Looking at your code, I don't quite see how this is supposed to work.
Basically I don't undertand if NLOPT is doing an optimation
in parallel or not. If it is I don't see how. If it isn't I'm not sure
how useful it will be for LAMMPS.

Here is the sequence of ops I see in the code:

a) iterate() is called by LAMMPS
b) iterate calls nlopt_optimize() which is in the NLOPT package
c) it makes numerous callbacks to objective_function(n,x,grad) thru the wrapper
d) objective function() calls energy_force() which is how LAMMPS evaluates
    the energy and forces (for all atoms in parallel)

You say (b) should only be invoked by proc 0,
which means only proc 0 will call the callback,
but from within the callback you want all the processors
to make the call to energy_force(), which is what
LAMMPS requires? Is that correct?

I think this could be done by having all procs that
aren't proc 0 wait for proc 0 to signal them that the objective
function has been called, then they all call energy_force()
together.

But here is a more fundamental question:

The args to the callback are n,x,grad, and grad is force in
LAMMPS lingo. Does that mean n = global degrees of freedom
= 3*natoms, and x,f are global vectors of all the coords and forces?
If so, I don't see how this could work very well in parallel.
Which would mean you want all the atom coords and forces
gathered to proc 0 on every iteration of the minimizer?
Those could be huge vectors for a large problem (won't fit
on one proc), they would also have to be scattered back to
each proc every iteration, if only proc 0 is updating them. Which
also sounds not very parallel.

I was expecting to see another callback from NLOPT where
it performed an update of x, e.g. with some vector and alpha,
which could also be done in parallel, like energy_force().

Steve

Dear Steve Plimpton,

Thanks for your message, I think all things you say are correct.
NLopt is not parallelized, I found one such suggestion in their mailinglist which has never answered, and I don’t know if other (GPL-compliant) parallel optimization libraries exist.
The reason I guess is that typically the cost of the algorithm is negligible compared to the objective function, but this may of course not be the case for an optimization with millions of atoms/degrees of freedom parallelized of thousands of CPUs, especially because of the increased communication time. I don’t know if geometry optimization is common in these systems, but it is certainly not in agreement with the “Large-scale[AMMPS]” philosophy.
For the quasi-newton methods: the memory will typically be dominated by the number of steps which are stored to construct the hessian estimate, of the order of 10-100. So for the memory I think that would still be managable on a single CPU (10^6 atoms would be about 10^8 bytes/100Mb).
Global or derivative-free algorithms might use more memory.

I will try to follow your suggestion to parallelize at least the “energy_force()” calculation and will send the result if it works.
If nothing else, I can at least use it for my systems which are generally quite small.

Thanks again.

Best,

Jaap