Memory issue

Hello Sir,

I am using the LAMMPS as a library in Fortran. I need to run lammps part of the code for lot of iterations (say 20,000) and I am having a memory leakage issue (Copy my error at the last). Memory leakage issue is caused by the lammps part because i check my Fortran code also attached a simplified Fortran code with lammps part only.

The memory is keep on increasing for every iteration when lammps is running. For an estimate, on 8 GB RAM computer, it just reached 590 iterations. I run on supercomputer and it just reach at 3000 iterations.

Thank you
Adnan

check.f90 (4.5 KB)

ERROR:
"slurmstepd: error: Detected 1 oom-kill event(s) in StepId=7542083.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
I_MPI_JOB_TIMEOUT = -1 second(s): job ending due to startup timeout
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=7542083.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler
"

I disagree with your assessment. The memory leakage is caused by your code.

A cursory look at your fortran code shows that in every loop iteration you are creating a new LAMMPS instance without closing the previous one and thus releasing the resources that that has allocated. This is the classical definition of a memory leak through allocating a resource and not releasing it and thus definitely a bug in your code.

Either you create the LAMMPS instance outside the loop and issue at the beginning of each loop iteration a clear command — LAMMPS documentation to reset the LAMMPS instance, or you have to add at the end of each iteration

call lmp%close(.false.)

To close the LAMMPS instance (without finalizing MPI because that cannot be re-initialized otherwise).

Please see: example for creating and deleting a LAMMPS object and the documentation of the LAMMPS fortran interface close() function

When I run the corrected fortran code (using the first method) with valgrind’s memcheck tool, there are no leaks visible:

==317201== LEAK SUMMARY:
==317201==    definitely lost: 0 bytes in 0 blocks
==317201==    indirectly lost: 0 bytes in 0 blocks
==317201==      possibly lost: 0 bytes in 0 blocks
==317201==    still reachable: 2,430 bytes in 12 blocks
==317201==         suppressed: 159,203 bytes in 25 blocks
==317201== 

That said, this check has exposed a bug (uninitialized memory access) in your wall fixes that needs to be addressed with the following patch:

  diff --git a/src/fix_wall_reflect.cpp b/src/fix_wall_reflect.cpp
  index 00ef968828..0169644e4a 100644
  --- a/src/fix_wall_reflect.cpp
  +++ b/src/fix_wall_reflect.cpp
  @@ -32,8 +32,7 @@ using namespace FixConst;
   /* ---------------------------------------------------------------------- */
   
   FixWallReflect::FixWallReflect(LAMMPS *lmp, int narg, char **arg) :
  -  Fix(lmp, narg, arg),
  -  nwall(0)
  +  Fix(lmp, narg, arg), nwall(0), varflag(0)
   {
     if (narg < 4) utils::missing_cmd_args(FLERR, "fix wall/reflect", error);
   
1 Like

Thank you so much for your help. Now i understand the mistake i am making.