write restart out-of-memory

_Haidong_Fan_SCU · May 7, 2020, 9:36am

Dear lammps-users,
I am running 298598400 atoms (eam/alloy potential) simulation on 8 nodes (32 cpu, 128G memory, 4 GPU, 16G memory/GPU).
Lammps runs well except the restart.
If "restart 2000 restart.*..mg fileper 128", I was killed by out-of-memory. The last restart file is created but size is 0 K. If I remove the command, lammps runs OK. I saw that the memory usage is 70 memory and 84% GPU memory.

I am wondering why the memory usage is heavy at restart.
Or is there any method to avoid such error without reducing simulation size?
Best,
Haidong

akohlmey · May 7, 2020, 10:17am

when writing a restart, LAMMPS needs to allocate a buffer to pack all local per-atom data into before writing it to the restart file.
for a large system, that buffer can become pretty large.
the only alternative that I can think of would be to see, if you can use write_data instead. I may need less of a buffer, since it writes the data file line by line and velocities are in a separate section.

axel.

_Haidong_Fan_SCU · May 7, 2020, 12:08pm

Thank you for your reply.
Write_data is possibly a good choice but it can not be used with % so the data file would be huge if I increase the simulation cell > 2 billion. One alternative method is to dump files with atomic coordinate and velocity and read_dump. However, I also found that dump is also possibly killed by out-of-memory. I guess the buffer issues still apply for dump. If I reduce the simulation cell, I am wondering how much memory usage is safe for both runing, dump and restart.

------------------ Original ------------------