Memory leak with OpenMPI 1.8.6

Dear all,

Arch Linux recently switched to OpenMPI 1.8.6 from OpenMPI 1.8.5, and this somehow introduced a memory leak when used with LAMMPS. If I just compile LAMMPS without any custom packages and run the melt-example using one core I see fairly normal memory usage of 15,484 K normal and 12,132 K shared. This happens both if I run it as ./lmp -in in.melt and as mpirun -np 1 ./lmp -in in.melt.

However, if I run the same example using two cores, the shared memory usage remains constant but the real usage slowly creeps up with about 4 K per 1000 time steps.

If I downgrade my system to before OpenMPI 1.8.6 replaced OpenMPI 1.8.5 everything works as expected and there are no leaks.

I suppose the OpenMPI developers would be the correct place to report this, but I can imagine it would help if the leak could first be isolated somehow, and for this I guess knowledge of the LAMMPS memory management would help, of which I have none. The trivial OpenMPI examples I could come up with showed no leakage.

If it is any help, I have attached some output from valgrind’s memcheck when I ran with one and two cores, both with MPI 1.8.5 and 1.8.6. The two-core output shows lots of invalid read/writes in LAMMPS_NS::AtomVecAtomic::pack_border for both for MPI 1.8.6 and MPI 1.8.5, but somehow LAMMPS only leaks for MPI 1.8.6.

I guess this is what you get for living on the bleeding edge…

openmpi_1.8.5_1_core.out (991 Bytes)

openmpi_1.8.5_2_core.out (630 KB)

openmpi_1.8.6_1_core.out (991 Bytes)

openmpi_1.8.6_2_core.out (630 KB)

looks like you are not alone:

http://www.open-mpi.org/community/lists/users/2015/06/27231.php

perhaps you should chime in and note that it may not be cuda related...

I suppose the OpenMPI developers would be the correct place to report this, but I can imagine it would help >if the leak could first be isolated somehow, and for this I guess knowledge of the LAMMPS memory >management would help, of which I have none. The trivial OpenMPI examples I could come up with showed >no leakage.

Nothing on the LAMMPS side is likely to help. A few years ago OpenMPI had

memory leaks that tripped up various apps. Sounds like they’ve re-introduced one.

On a related note, last time I checked if you run valgrind on a LAMMPS/OpenMPI job

you get many annoying memory error messages on code locations internal to OpenMPI,

including various un-freed memory blocks at the end of the run. Which is not

the same as a leak, but more typically indicates incomplete clean-up.

Steve

Right, thanks. I have sent a mail to the OpenMPI mailing list, just to make sure they know it is (probably) not CUDA-related. For now I will just use OpenMPI 1.8.5, everything works fine for that.

So, this was indeed a bug on the OpenMPI side (obvious from the fact that LAMMPS with OpenMPI 1.8.5 works fine). It will be fixed in the next release: http://www.open-mpi.org/community/lists/users/2015/07/27260.php

Thanks for following up on this with the OpenMPI folks.

Steve