MSEVB using Python API

blake · February 14, 2023, 1:59am

Hello,

I’m trying to implement the multi-state empirical valence bond theory method using LAMMPS and its Python API. I understand that it seems to already have an implementation in the form of RAPTOR, but our research group wants to have an easily modifiable/usable version (hence the Python).

I have a working version of the code (with only a few bugs) that I can send upon request if someone would like a look. Briefly, my current implementation is as follows:

Create one main lammps object that is used to calculate the main potential energy and integrate the equations of motions/thermostat with any necessary fixes.
After each step, a once-off new lammps object is created per EVB state that is identified, with this lammps object reused in subsequent steps. These new lammps objects are created by reading a restart file dumped by the main lammps object (faster way I could see to copy across all the topological information from one object to another).
The topology of the new lammps object is changed through “set” and “create_bonds” commands to match the identified EVB state, and then the “run 0” command is used to allow the computation of the potential energy. From what I can tell, calling run 0 is quite slow assuming because some neighbourlists are changed/updated.
With each subsequent step, given no reaction has happened yet, the EVB-state lammps objects have their positions and box data changed from the “scatter_atoms” method and the “change_box” command. If a reaction occurs, the corresponding lammps object with the new identified state writes a restart file which is read by all other lammps objects (including the main).

With that explained, my question is if there is a more efficient way to do this. The code runs quite slowly (around 1 ns/day for a 25 Angstrom box of water with a hydronium ion in it). The bottleneck seems to be after calling run 0 for each EVB state, so I was wondering if there is some command I have missed that can calculate the potential energy once-off without rebuilding neighbourlists/preparing the object for a simulation. I have also attempted some parallelisation using pylammpsmpi on github (GitHub - pyiron/pylammpsmpi: Parallel Lammps Python interface - control a mpi4py parallel LAMMPS instance from a serial python process or an Jupyter notebook) with little success. It doesn’t seem to like having multiple lammps objects active at one time, and I can’t imagine parallelisation will help with the current bottleneck.

Any tips would be much appreciated.

Cheers,
Blake

akohlmey · February 14, 2023, 3:29am

You can parallelize each LAMMPS instance and over different EVB states and also run the main LAMMPS instance on a separate group of MPI processes.

The important step is to use a new communicator for each LAMMPS instance that you want to use concurrently or alternately. This can be done with mpi4py directly. Please see below for a simple example that will run two LAMMPS instances concurrently on half the processes and maintains two LAMMPS instances per process, so there are 4 LAMMPS simulations in total, but two are running concurrently on different processors each. You can see that by looking at the individual log files.

Second, you can reduce the overhead of running the neighbor list rebuild by adding ‘pre no’ to the run 0 command (you also want to use post no, which you can always do). However, with ‘pre no’ you have to be careful. If there are topology changes, you must not use it and also you need to regularly rebuild your neighbor lists as atoms are moving around. Depending on your system and the timestep, that can be every other run 0 command or every 5 or even every 10. Please keep in mind the law of diminishing returns. When rebuilding the neighbor lists only every 5 steps, the overhead is already reduced by 80% and for every 10 steps, the overhead is reduced by 90%, so there is not much benefit from going beyond that.

#!/usr/bin/env python3
from lammps import lammps
from mpi4py import MPI

me_global = MPI.COMM_WORLD.Get_rank()
np_global = MPI.COMM_WORLD.Get_size()
comm_global = MPI.COMM_WORLD

if me_global == 0:
    print("Running with %d processors" % np_global)

# new communicators for half the ranks each
color = me_global % 2
key = me_global
num = int(np_global/2)
comm = []
comm.append(comm_global.Split(color, key))
comm.append(comm_global.Split(color, key))

lmp = []
args = ['-nocite', '-screen', 'none', '-var', 'id', str(me_global), '-log', 'log.lammps-'+str(me_global)]
lmp.append(lammps(cmdargs=args, comm=comm[0]))
lmp[0].file('in.melt')

args = ['-nocite', '-screen', 'none', '-var', 'id', str(me_global+num), '-log', 'log.lammps-'+str(me_global+num)]
lmp.append(lammps(cmdargs=args, comm=comm[1]))
lmp[1].file('in.melt')

lmp[0].command('run 200 pre no post no')
lmp[1].command('run 200 pre no post no')

lmp[0].command('run 200 pre no post no')
lmp[1].command('run 200 pre no post no')

lmp[0].close()
lmp[1].close()

using the following file as in.melt:

units           lj
atom_style      atomic

lattice         fcc 0.8442
region          box block 0 10 0 10 0 10
create_box      1 box
create_atoms    1 box
mass            * 1.0

velocity        all create 3.0 87287 loop geom

pair_style      lj/cut 3.0
pair_coeff      1 * 1.0 1.0 1.122

neighbor        0.3 bin
neigh_modify    every 10 delay 0 check no

fix             1 all nve

variable       id index 0
dump           id all atom 50 dump-${id}.lammpstrj

thermo          50
run 200 post no

blake · February 15, 2023, 3:51am

Thanks for your response.

I don’t really have much experience with using mpi4py so you’ll have to forgive my naivety. I’ve managed to get my code to create a lammps object per process, ie. I run my python file with four processors and create four communicator objects.

color = me_global % 4
key = me_global 
comm = []
for i in range(4):
    comm.append(comm_global.Split(color, key))

I’ve had no luck really running multiple processors per lammps object, ie, request 16 processors with four processors per lammps object. I might’ve got it working in your simple example:

from lammps import lammps
from mpi4py import MPI

me_global = MPI.COMM_WORLD.Get_rank()
np_global = MPI.COMM_WORLD.Get_size()
comm_global = MPI.COMM_WORLD

if me_global == 0:
    print("Running with %d processors" % np_global)

# new communicators for half the ranks each
color = me_global // 2
key = me_global % 4
# split communicator by color and save to list
comm = []
for i in range(4):
    comm.append(comm_global.Split(i, me_global))

lmp = []
for i in range(4):
    args = ['-nocite', '-screen', 'none', '-var', 'id', str(i), '-log', 'log.lammps-'+str(i)]
    lmp.append(lammps(cmdargs=args, comm=comm[i]))
    lmp[i].file('in.melt')

lmp[0].command('run 10000 pre no post no')
lmp[1].command('run 10000 pre no post no')
lmp[2].command('run 10000 pre no post no')
lmp[3].command('run 10000 pre no post no')

lmp[0].command('run 10000 pre no post no')
lmp[1].command('run 10000 pre no post no')
lmp[2].command('run 10000 pre no post no')
lmp[3].command('run 10000 pre no post no')

lmp[0].close()
lmp[1].close()
lmp[2].close()
lmp[3].close()

But when I try to translate this idea into my code, it breaks when I try and edit the topology of any of the lammps objects, specifically when I call the “set” and “create_bonds” commands.

In the case where I have one object per process, I’ve tried to use the ThreadPoolExecutor from concurrent.futures to run the for loop which changes the topology and calls run 0 for each EVB state concurrently, but I can’t figure out how to specify which communicator to use per thread (if I even need to), and so it also breaks.

On the neighbour list rebuilding, to be sure I understand you correctly, whenever I update the topology I have to call run 0 pre yes, so the only time I can call run with ‘pre no’ is in the case of the main object which deals with the dynamics when I call run 1. However, even when I use a very small timestep of 0.1 fs and run 1 pre no every other step, the simulation explodes.

Cheers,
Blake

akohlmey · February 15, 2023, 4:28am

Blake,

It looks to me that you have a problem understanding MPI parallelization. Please note that the python script is executed on all parallel processes, and when you have commands on different parallel LAMMPS instances, they must be identified accordingly.
If you have 16 total processors (mpirun -np 16) and want to run 4 LAMMPS instances with 4 MPI processes each, then you have only one instance per python script.

To set the processes up you need to do something like this:


color = me_global // 4
key = me_global
comm = comm_global.Split(color, key)
lmp = lammps(cmdargs=['-nocite', '-screen', 'none', '-var', 'id', str(me_global),  '-log', 'log.lammps-'+str(me_global)], comm=comm)
lmp.file('in.melt')

# commands for each parallel LAMMPS instance
if color == 0:
    lmp.command('run 10000 pre no post no')
elif color == 2:
    lmp.command('run 10000 pre no post no')
elif color == 3:
    lmp.command('run 10000 pre no post no')
else:
    lmp.command('run 10000 pre no post no')

# stop parallel instances
lmp.close()

No. Please note the value of “color” determines the group of processes, i.e. the sub-communicator. With color = me_global // 4 you have a value of 0 for the first four MPI processes, a value of 1 for the second 4 and 2, and 3 accordingly for the rest. The value of key determines the order within the sub-communicator and could also be just set to 0 (or the argument left out, since 0 is the default value).

For your application, of course, things become complex since you now need to communicate from your “master” LAMMPS instance to the individual sub-groups. Typically, that would require creating even more custom communicators or you would write out files and read those back.

Mixing MPI and threads this way is a very bad idea.

Whenever there are significant changes to any system that would affect the neighbor lists.

I know too little about your code and the problem you are trying to parallelize is far too complex to discuss via forum messages and specifically for me to explain all the details about MPI parallelism that are relevant.

My advice is thus, find some local expert help with parallel programming. The problem you want to solve is complex to do in parallel. All I can do and have done, is point you into the right direction.

blake · March 21, 2023, 8:33am

Hi Axel,

I’ve had some success using your feedback and now have the calculation of each EVB states running concurrently with each EVB state using multiple processors.

With regards to the re-calculation of the neighbour list each step, I’ve now realised that the documentation says the “pre yes” flag on the run command controls the calculation of the neighbour list and the calculation of the forces. This is somewhat unfortunate as I will always need the forces each step, but, as you said, might only need the neighbour list computed every few timesteps. Is it possible I’m missing a workaround to this problem?

Cheers,
Blake

akohlmey · March 21, 2023, 9:37am

Well, have you tried using “pre no” for steps where you think you won’t need to reneighbor?

This is risky, though, as you may get an incorrect neighbor list and because of that incorrect forces.

blake · March 22, 2023, 1:36am

I have and it seems fine. My main concern is whether or not the forces will be calculated when specifying “pre no” or if the previously computed forces are being used. Likewise, if I set the forces of my system using lmp.scatter_atoms("f", 1, 3) and then call “run 0 pre yes” or “run 0 pre no” will it re-calculate and override the forces that I manually set?

Blake

akohlmey · March 22, 2023, 1:41am

Forces will always be computed. Please see the Verlet::setup_minimal() function.

Yes. The scatter_atoms call has no effect. The forces will be set to 0, always, and then recomputed.

  // compute all forces

  ev_set(update->ntimestep);
  force_clear();
  modify->setup_pre_force(vflag);

  if (pair_compute_flag) force->pair->compute(eflag,vflag);
  else if (force->pair) force->pair->compute_dummy(eflag,vflag);

  if (atom->molecular != Atom::ATOMIC) {

blake · March 22, 2023, 1:54am

Does that mean that there is no efficient way to set the forces of the system through the Python API? Is running the “fix setforce” command for each atom individually the only way?

akohlmey · March 22, 2023, 2:02am

There is fix external command — LAMMPS documentation
The callback may be set from python and be a python function. Please see: set_fix_external_callback to have a function in your python code be called when the fix is invoked and set to update the force array it manages (you can apply the same force in multiple steps, which can be useful if the external code is slow, e.g. a quantum code) and fix_external_get_force to get direct access to the force array and write data to it so that it will then applied during the post force step.

blake · April 18, 2023, 1:35am

Hi Axel,

I’ve been trying to figure out if there’s a way to redistribute the MPI ranks amongst the identified EVB states after they’ve already been created. For example, if I have requested 20 MPI processes, and then I create four unique LAMMPS systems, (four EVB states) each with 5 processes, but then after running for 10 steps there are now only three EVB states identified, I would like to redistribute so that the processors are split evenly now between the three states (ie, 7, 7, 6). I’ve tried to achieve this by calling the close() method on each LAMMPS object and then recreating them with the new split communicator object, but it seems like there might be something going on in the background (something to do with the finalize() method?) preventing this from working. Is there anyway I may be able to achieve this?

Cheers,
Blake

akohlmey · April 18, 2023, 1:57am

I have no problem recreating the LAMMPS objects with different numbers of sub-communicators in multiple chunks.

from mpi4py import MPI
from lammps import lammps

allcomm = MPI.COMM_WORLD
rank = allcomm.Get_rank()

# split into two groups
color = rank // 2
comm = allcomm.Split(color)

lmp = lammps(cmdargs=['-screen', 'none', '-log', 'log.first-' + str(color)], comm=comm)
lmp.file('in.melt')
lmp.close()
comm.Free()

# split into three groups
color = rank // 3
comm = allcomm.Split(color)    
    
lmp = lammps(cmdargs=['-screen', 'none', '-log', 'log.second-' + str(color)], comm=comm)
lmp.file('in.melt')
lmp.close()
comm.Free()

# split into four groups
color = rank // 4
comm = allcomm.Split(color)    
    
lmp = lammps(cmdargs=['-screen', 'none', '-log', 'log.third-' + str(color)], comm=comm)
lmp.file('in.melt')
lmp.close()
comm.Free()

lmp.finalize()

blake · April 18, 2023, 12:30pm

I’ve tried to make a minimal example to show you my problem:

from mpi4py import MPI
from lammps import lammps
import numpy as np

allcomm = MPI.COMM_WORLD
np_global = MPI.COMM_WORLD.Get_size()
rank = allcomm.Get_rank()

# split into one group
color = 0
comm = allcomm.Split(color)

cmds = [
    "units metal",
    "atom_style full",
    "boundary p p p",
    "read_data coord.lmp",
    "change_box all triclinic",
    "include ff.lmp",
    "fix md all nve",
    "fix tst all temp/csvr 300 300 0.1 39827",
    "timestep 0.001",
    "velocity all create 300 93847 mom yes dist gaussian",
    "fix com all momentum 100 linear 1 1 1",
    "fix ext all external pf/array 1",
    "run 0 post no",
    "run 1000 post no",
]

lmp = lammps(cmdargs=['-screen', 'none', '-log', 'log.first-' + str(color)], comm=comm)
lmp.commands_list(cmds)
lmp.close()
comm.Free()

# split into three groups
color_list = sorted(np.arange(np_global) % 3)
color = color_list[rank]
comm = allcomm.Split(color)

lmp = lammps(cmdargs=['-screen', 'none', '-log', 'log.second-' + str(color)], comm=comm)
lmp.commands_list(cmds)
lmp.close()
comm.Free()

lmp.finalize()

where coord.lmp and ff.lmp are:

ff.lmp (3.2 KB)
coord.lmp (88.8 KB)

The issue is every so often (maybe one out of every 10 times I run this python file) I’ll get an error along the lines of ERROR on proc 0: Bond atoms 230 231 missing on proc 0 at step 23 (src/ntopo_bond_all.cpp:59) Last command: run 1000 post no.

I thought the error might be with my input, but the problem only ever happens on the ‘second’ attempt, and never have I seen it happen on the first.

I would appreciate any help, thanks

akohlmey · April 18, 2023, 1:48pm

It is, in part, a problem with your input, but not where you might think it is. The issue, that I see is that you are using fix external without setting a callback function. This will result in, both, its local force array and its local virial array to be uninitialized. This is not a problem on the first run, since those are “new” allocations, and due to the way how memory allocation works on Linux, those will be all set to 0.0. However, this is not true on the second and any subsequent run. Most of the allocated memory will be freed by lmp.close() and comm.Delete() and then the memory allocation algorithm will “recycle” previously allocated memory, but not necessarily at the same address for the same purpose and thus in the second and any subsequent runs, you can have random and bogus data in your force arrays leading to errors due to unphysical forces.

This can be suppressed by the following change to fix external:

  diff --git a/src/fix_external.cpp b/src/fix_external.cpp
  index c9d019d963..349558fe86 100644
  --- a/src/fix_external.cpp
  +++ b/src/fix_external.cpp
  @@ -65,6 +65,7 @@ FixExternal::FixExternal(LAMMPS *lmp, int narg, char **arg) :
     atom->add_callback(Atom::GROW);
   
     user_energy = 0.0;
  +  memset(user_virial, 0, sizeof(user_virial));
   
     // optional vector of values provided by caller
     // vector_flag and size_vector are setup via set_vector_length()
  @@ -321,6 +322,7 @@ double FixExternal::memory_usage()
   void FixExternal::grow_arrays(int nmax)
   {
     memory->grow(fexternal,nmax,3,"external:fexternal");
  +  memset(&fexternal[0][0], 0, sizeof(double)*3*nmax);
   }
   
   /* ----------------------------------------------------------------------

blake · April 19, 2023, 3:55am

Thank you very much, I never would have reached that conclusion by myself. This seems to have fixed my problem

blake · April 27, 2023, 8:44am

Hi Axel,

Another question. In the case where I’d like to run constant pressure (fix md all nph iso 1 1 1 tchain 5 pchain 5 mtk yes), I need to retrieve the per atom virial tensors per EVB state, mix them then set them in the ‘main’ lammps object. So far, I’ve got this to work by creating a compute for the virial compute vir all stress/atom NULL virial, calling lmp.numpy.extract_compute("vir", 1 ,2) and then mixing and calling lmp.numpy.fix_external_set_virial_peratom("ext", virial). However, I am worried that this method might be recomputing the per atom virial each step (requiring a run 0 pre yes post no every step) instead of grabbing the per atom virial that must already be computed each step. From what I can tell, there is no gather_atoms for the virial, which would be ideal in my case. Is what I’ve found correct, or am I missing something?

Cheers,
Blake

akohlmey · April 27, 2023, 10:43am

The per-atom virial information is usually not collected unless there is a “consumer” for it.
The global pairwise virial (required by the barostat) can much faster be computed before the reverse communication is done via F dot r, and so that is done and the pairwise tallying of per-atom virial data skipped.

To change this, you have to add a “consumer” for your virial data from compute stress/atom. This could be an instance of fix store/state, for example. And you can then retrieve the latest updated per-atom virial data from the fix. Since fixes allocate permanent storage, you can also access it in between runs.

blake · May 1, 2023, 4:02am

I’ve had a go trying to implement what you’ve described. Below is my code:

from mpi4py import MPI
from lammps import lammps
import numpy as np

allcomm = MPI.COMM_WORLD
np_global = MPI.COMM_WORLD.Get_size()
rank = allcomm.Get_rank()

# split into two groups
color = 0
comm = allcomm.Split(color)

cmds = [
    "units metal",
    "atom_style full",
    "boundary p p p",
    "read_data coord.lmp",
    "change_box all triclinic",
    "include ff.lmp",
    "fix md all nve",
    "fix tst all temp/csvr 300 300 0.1 39827",
    "timestep 0.001",
    "velocity all create 300 93847 mom yes dist gaussian",
    "fix com all momentum 100 linear 1 1 1",
    # forces
    "fix ext_for all external pf/array 1",
    "fix_modify ext_for virial no",
    # virial
    "fix ext_vir all external pf/callback 1 1",
    "fix_modify ext_vir virial yes",
    "compute virial all stress/atom NULL virial",
    "fix ss_vir all store/state 1 c_virial[1] c_virial[2] c_virial[3] c_virial[4] c_virial[5] c_virial[6]",
    "variable str_vir string tmp"
]

def callback(lmp, ntimestep, local, tag, x, f):
    str_vir = lmp.extract_variable("str_vir")
    arr_vir = eval('np.array(' + str_vir + ')')
    vatom = arr_vir[tag - 1]
    lmp.fix_external_set_virial_peratom("ext_vir", vatom)


lmp = lammps(cmdargs=['-screen', 'none', '-log', 'log.first-' + str(color)], comm=comm)
lmp.commands_list(cmds)
g_ids = lmp.gather_atoms("id", 0, 1)
force = lmp.fix_external_get_force("ext_for")
ids = lmp.numpy.extract_atom("id")
for n, _ in enumerate(ids):
    force[n][0] = 0.0
    force[n][1] = 0.0
    force[n][2] = 0.0
vir_0_str = np.array2string(
    np.zeros((len(g_ids), 6)), threshold=len(g_ids) * 6 + 1, precision=10, separator=',', suppress_small=True
)
lmp.set_variable("str_vir", vir_0_str)
lmp.set_fix_external_callback("ext_vir", callback, lmp)
lmp.command("run 0 post no")

vir = np.zeros((len(g_ids), 6))
ids = lmp.numpy.extract_atom("id")
virial = lmp.numpy.extract_fix("ss_vir", 1, 2)
vir[ids - 1] = virial
if rank == 0:
    for source in range(1, np_global):
        _vir = allcomm.recv(
            source=source, tag=source
        )
        vir += _vir
else:
    allcomm.send(vir, dest=0, tag=rank)
vir = allcomm.bcast(vir, root=0)
array_string = np.array2string(vir, threshold=len(vir) * 6 + 1, precision=10, separator=',', suppress_small=True)
lmp.set_variable("str_vir", array_string)

lmp.command("run 0 post no")
lmp.command("run 1 post no") # BREAKS HERE

lmp.close()
comm.Free()
lmp.finalize()

The gist is that I use the pf/callback fix external to set the peratom virial from an array that I pass into the lmp object as a string variable (very clunky but I don’t know how else I would do it).

I’m hoping if you could tell me if this is a sensible way of doing this, and also to tell me why it doesn’t work. I get the following error when calling run 1 post no:

ERROR: Per-atom virial was not tallied on needed timestep (src/compute_stress_atom.cpp:145)
Last command: run 1 post no

I’ve looked into the docs for the variable command under ‘Variable Accuracy’, but none of the solutions have worked for me.

akohlmey · May 1, 2023, 11:30am

I have no idea how to do such things on the python level.
Several of the steps you are doing do not make much sense to me:

why two different fix external instances? just pick one and stick with it.
the use of a string variable makes no sense at all since it is a global and not a per-atom property; since you have 6 virial components and those are per-atom properties, the only way to pass them would be 6 separate atom style variables. But do you really need the per-atom virial? Wouldn’t a global virial be sufficient? That would make it much easier to pass them around.
setting the values of atom style variables from a command is not directly possible, but you can use compute or fix references, e.g. variable vir1 atom f_ss_vir[1]
you cannot access a stress/atom compute unless you have at least one complete force computation, since the compute only makes data available that is collected during the force computation.

I doubt that you will find ready-to-use solutions in the documentation since what you are trying to do is far beyond what people usually consider.

I have recently implemented a 2-partition system with mixed forces and virial for use with alchemical transformations in fix alchemy command — LAMMPS documentation (source in src/REPLICA/fix_alchemy.cpp/h). It may be possible to copy and modify this fix to make it suitable for your needs to do the heavy lifting for your model.

blake · May 3, 2023, 5:00am

Thanks for your feedback Axel. You’re correct about being able to use the global virial tensor for my purposes, however it seems to be giving me behaviour I wouldn’t have expected. When using fix external pf/array and calling fix_external_set_virial_global("ext", [10,10,10,10,10,10]) for example, I would have expected that each component in the passed list would be added to the corresponding global virial tensor component. However, what I have found is that each list component is multiplied by some constant (that seems related to the box volume) and then added to the global virial tensor. Is this the case? If so, why?

Cheers,
Blake