Enabling LAMMPS to restart simulation from same script

I have a LAMMPS script that I run hundreds or thousands of times on an HPC partition that is prone to killing jobs. As a result, I need to ensure that my LAMMPS simulations can be restarted automatically. However, despite poring through the documentation, it does not seem clear whether LAMMPS’s restart capabilities are able to meet my needs.

My simulation

My simulation can be broken down into the following steps:

  1. Read system.lammps data file
  2. Perform FIRE minimization (usually 1,000-2,000 timesteps)
  3. Perform NPT relaxation via fix (1,000 timesteps)
  4. Perform NPT quench via fix (10,000 timesteps)
  5. Write to system.lammpstrj file.

What I want

I want to run my script to run and complete the above 5 steps, starting from wherever it last ended. For example:

  • If the script has never been run before, than read the initial data file and run the steps
  • If the script previously ended at 400 time steps of the NPT relaxation and was killed, then on the next run resume it from NPT relaxation timestep 400 (or from whenever the previous restart file was written)
  • If the previous run finish the quench only at 8000 timesteps, then resume the run from timestep 8000 of NPT quenching (or whatever the previous restart file was written)

I hope this is clear. Essentially I want to perform my minimization/relaxation/quenching steps without repeating what has been done previously before the job was killed.

The issue I’m facing

So far unfortunately I have been forced to break this up into two scripts. The crux of the issue lies behind my not being able to tell LAMMPS whether a simulation has been run before. If I use read_data, then my simulation forcibly completes all the steps. If I use read_restart, LAMMPS tells me that there is no restart file. If I use both simultaneously, I get an error that I can’t read a restart file when there is already a simulation box defined.

My solution so far

What I’ve been able to do so far is break my scripts up into two, with the following structure:

Script 1 (steps 1-3, no restart capabilities)

  1. Read system.lammps data file
  2. Perform FIRE minimization (usually 1,000-2,000 timesteps)
  3. Perform NPT relaxation via fix (1,000 timesteps)
    3.1. Write a restart file system-0.restart

Script 2 (steps 4-5, has restart capabilities)

4.1. Read system-*.restart file
4.2. Perform NPT quench via fix (10,000 timesteps) & write restart files system-*.restart every 1,000 timesteps
5. Write to system.lammpstrj file.

While this gets the job done by allowing the second script to restart from the restart files, I’m looking for a way to implement the whole pipeline into a single script. Is this possible?

This can be done with an if statement and the jump/label commands.

if $(is_file(system-0.restart)) then "jump SELF step4"

# put your steps 1-3 here

label step4

# put your steps 4-5 here. remember to use "upto" in the run command

Since your production run is much longer than the prep runs, it is not worth to complicate things trying to restart those. They cover only a small percentage to the total time. Also, if the machine is that unstable, then it is worth the effort to look elsewhere, for example in national or regional supercomputing centers.

1 Like

Thank you! Using an if statement effectively can help me do what I want.

In case I ever need to come back to this, just note that you also need to put an if statement when defining the simulation box. Either you read_data if the restart file doesn’t exist, or read_restart if it does exist. Afterwards you can have the other if statement skipping the minimization/relaxation/writing of the initial relaxation file.