Memory leak with Python-LAMMPS interface

Looping over LAMMPS instances in a python script causes a memory leak, specifically when using the create_box command. Here’s a minimum reproducible example that shows the leak:

"""
Simple example to reproduce memory leak using the LAMMPS-Python interface.
Usage:
    python mem-leak.py
"""

import lammps
import os, psutil
#import gc

process = psutil.Process(os.getpid())
print(f"Memory before loop: {process.memory_info().rss}") # bytes
mem_before = process.memory_info().rss

nloops = 1000
for l in range(1,nloops+1):

    lmp = lammps.lammps(cmdargs=["-screen", "none"])

    lmp.command("clear")
    lmp.command("units metal")
    lmp.command("atom_style atomic")
    lmp.command("boundary p p p")
    region_command = "region pybox prism 0 10 0 10 0 10 0 0 0"
    lmp.command(region_command)

    # memory leak happens due to create_box, comment this command out to see

    lmp.command("create_box 1 pybox")
    
    # attempts to clean up lmp object:
    #lmp.command("clear")
    lmp.close()
    #del lmp
    #gc.collect()

    print(f"Loop {l} memory: {process.memory_info().rss}") # bytes

mem_after = process.memory_info().rss
print(f"Memory leaked: {mem_after - mem_before}") # bytes

See that the memory increases with each loop.

I thought LAMMPS would automatically deallocate everything with the clear command, and that lmp.close() would clean things up on the python side, but that does not seem to be the case.

Maybe whatever is being allocated with create_box is not being deallocated during lmp.close(). I am happy to change the C++ library functions to deallocate more things if necessary - anyone ever seen this before or know why it happens?

When reporting any issues, please always mention which LAMMPS version you are using.

I don’t think that this is a LAMMPS issue because were are checking LAMMPS itself regularly for memory leaks. You can double check this easily by implementing the same code with the C library interface directly:

#include <stdio.h>
#include <sys/resource.h>

#include "library.h"

int main(int argc, char **argv)
{
    struct rusage ru;
    int i;
    void *handle;

    const char *args[] = {"leak-test", "-screen", "none"};
    const int narg = sizeof(args)/sizeof(const char *);

    for (i = 0; i < 1000; ++i) {
        handle = lammps_open_no_mpi(narg, (char **)args, NULL);
        lammps_command(handle, "clear");
        lammps_command(handle, "units metal");
        lammps_command(handle, "atom_style atomic");
        lammps_command(handle, "boundary p p p");
        lammps_command(handle, "region pybox prism 0 10 0 10 0 10 0 0 0");
        lammps_command(handle, "create_box 1 pybox");
        lammps_close(handle);
        getrusage(RUSAGE_SELF, &ru);
        printf("% 4d: RSS use: %10.3gMB\n",i+1, (double)ru.ru_maxrss/1024.0);
    }
    return 0;
}

In this case the memory usage is constant (for me). So any explanation for the difference in memory use has to come from whatever Python does.

When I run your script, I also don’t really see a typical leak, since the memory usage plateaus at some point and does not increase even when increasing the number of loop iterations to 10000.

So my interpretation is that what you are seeing is just Python at work trying to keep a cache of data to make it run faster.