A method of seamelessly appending log to a calculation starting from a restart file

maaliso · September 30, 2024, 10:30am

I am trying to understand how the log append can be useful in restarting the calculation from a restart file.

The following situation happens.
I start an MD simulation for 15 hours, but the simulation requires 18 hours of simulating time to finish, so because of the wrong estimation of the time to finish, the log file now contains a line at an arbitrary timestep, and this line is incomplete.

Here the last 2 lines for comparison:

991000   600.16942     -204030.34     -202013.39     -1.7232312      360813.26      677.45993      25.955502      20.519626      90             90             90             182.67295     -1117.7095      929.86681     -817.12833     -31.053616      11.832623          26000 
991500   595.69589     -204044.91     -202042.99     -1244.8861      360857.44      677.54289      25.955502      20.519626      90             90             90            -985.72463     -1758.3697     -990.56402     -1212.0344      212.33054     -126.53518

Notice that the last column is empty on the last line, but it shouldn’t be.

Trying to solve this, one should restart the calculation from the last restart file that was written. This will be at the timestep 990,000, and to use this, probably the option log append seems like the right option here. However, using
log append simulation.log
only appends and doesn’t remove the timesteps between 990,000 and the last written one at 991,500. This leaves the log file ugly and impossible to parse correctly without deleting those lines.

I want to ask, if there is a method to use the restart file to start the simulation at a certain timestep and then have the log file being seamlessly joined between the original one and the one created after restarting the simulation without the extra lines in the middle, i.e., steps [990,000-991,500].

My idea of a perfect log in this case is a log file deleting all the lines from the original simulation in the interval [990,000-991,500] and continuing with the restart file from 990,000-END. I could almost see Axel writing the comment that my idea is wrong , but I wanted to share my idea, since maybe is out there.

Thanks for your help.
Using LAMMPS versions Aug23
&& Jan24
Ubuntu 22.04 system
Simulating between 100 Atoms and 26000 atoms using MD fix npt, nvt, time/ave

akohlmey · September 30, 2024, 11:27am

Yes, your idea is fundamentally flawed. In general, appending to files in the context of batch system runs that may be terminated prematurely for one reason or another or may crash due to instability of a platform or bugs in the code or some other reason is a very bad idea.

It would be highly complex to make modifications to an existing file in an automated fashion, so any failure there could destroy more (valid) data and it makes detecting flaws or inconsistencies much harder. Thus having separate files is the preferred option, since you can easily see the data corruption and fix it with a text editor.

I also strongly recommend the use of YAML format outputs. E.g. with thermo_style yaml or thermo_style custom in combination with thermo_modify line yaml. This output format is particularly easy to extract from log files and if you store data in a dictionary indexed by the timestep number, any subsequent data will overwrite the previous and thus the extraction of a continuous log with data from a sequence of runs where some were interrupted is particularly straightforward.

We have implemented “append” functionality to some output options mostly to shut up people that kept asking for it, but it is not considered a good idea, neither for log files, nor dump files, nor time avering output files.

hothello · September 30, 2024, 12:04pm

Very wise

maaliso · September 30, 2024, 12:31pm

Thanks for the suggestions. I will read more.