Dear all,
Arch Linux recently switched to OpenMPI 1.8.6 from OpenMPI 1.8.5, and this somehow introduced a memory leak when used with LAMMPS. If I just compile LAMMPS without any custom packages and run the melt-example using one core I see fairly normal memory usage of 15,484 K normal and 12,132 K shared. This happens both if I run it as ./lmp -in in.melt and as mpirun -np 1 ./lmp -in in.melt.
However, if I run the same example using two cores, the shared memory usage remains constant but the real usage slowly creeps up with about 4 K per 1000 time steps.
If I downgrade my system to before OpenMPI 1.8.6 replaced OpenMPI 1.8.5 everything works as expected and there are no leaks.
I suppose the OpenMPI developers would be the correct place to report this, but I can imagine it would help if the leak could first be isolated somehow, and for this I guess knowledge of the LAMMPS memory management would help, of which I have none. The trivial OpenMPI examples I could come up with showed no leakage.
If it is any help, I have attached some output from valgrind’s memcheck when I ran with one and two cores, both with MPI 1.8.5 and 1.8.6. The two-core output shows lots of invalid read/writes in LAMMPS_NS::AtomVecAtomic::pack_border for both for MPI 1.8.6 and MPI 1.8.5, but somehow LAMMPS only leaks for MPI 1.8.6.
I guess this is what you get for living on the bleeding edge…
openmpi_1.8.5_1_core.out (991 Bytes)
openmpi_1.8.5_2_core.out (630 KB)
openmpi_1.8.6_1_core.out (991 Bytes)
openmpi_1.8.6_2_core.out (630 KB)