How to restart parallel tempering

Hi,

I have finished one parallel tempering for 200 ns, and I used this command to generate the restart file:

restart 20000000 restart.$Q.1 restart.$Q.2

Q is a variable defined as : variable Q world 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40.

And the last order of temperature from log.lammps file is :
200000000 7 3 28 37 30 35 33 26 11 18 36 24 2 1 15 4 9 16 8 19 0 6 14 32 17 27 34 39 29 38 5 23 25 31 20 22 40 10 21 13 12

Then I used this input file to restart the parallel tempering:

******Initialization

boundary p p p
newton off
neigh_modify delay 5 every 5 check yes

variable Q world 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
variable w world 7 3 28 37 30 35 33 26 11 18 36 24 2 1 15 4 9 16 8 19 0 6 14 32 17 27 34 39 29 38 5 23 25 31 20 22 40 10 21 13 12

pair_style mgp

***read_data

read_restart restart.$Q.2

variable t world 300.0 320.0 340.0 360.0 380.0 400.0 420.0 440.0 460.0 480.0 500.0 520.0 540.0 560.0 580.0 600.0 610.0 615.0 620.0 625.0 630.0 635.0 640.0 650.0 655.0 660.0 665.0 670.0 680.0 690.0 700.0 720.0 740.0 760.0 780.0 800.0 820.0 840.0 860.0 880.0 900.0
pair_coeff * * mapped_model_7_3b_energy_True_opt_False.mgp Au yes yes

****Define pair styles

thermo 0
thermo_style custom step temp etotal pe ke vol press lx ly lz
thermo_modify flush yes

timestep 0.002

velocity all create 300.0 4928459

variable STEP equal step
variable TEMP equal temp
variable ETOTAL equal etotal
variable PE equal pe
variable KE equal ke
variable VOL equal vol
variable PRESS equal press
variable LX equal lx
variable LY equal ly
variable LZ equal lz
variable PXX equal pxx
variable PYY equal pyy
variable PZZ equal pzz

dump 1 all custom 20000 Au_147.$Q.lammpstrj id type x y z

dump_modify 1 sort id

fix thermo_output all print 2000 “{STEP} {TEMP} {ETOTAL} {PE} {KE} {VOL} {PRESS} {LX} {LY} {LZ} {PXX} {PYY} ${PZZ}” file thermo.$Q.lammps title “#step temp etotal pe ke vol press lx ly lz pxx pyy pzz”

fix COM all momentum 1 linear 1 1 1 angular

restart 20000000 restart.$Q.1 restart.$Q.2
fix myfix all nvt temp $t $t 0.1
temper 40000000 2000 $t myfix 36312 12122 $w

run 5000000

write_data Au_147.$Q.data

However, it doesn’t work with error:
srun: error: p-sc-2178: tasks 0-1,3,8-9,11,15-16,20-21,23-24,26,37,39-40,43-44: Killed

srun: error: p-sc-2178: tasks 2,4-7,10,12-14,17-19,22,25,27-36,38,41-42: Exited with exit code

May I ask for help with this problem? And when i use read_restart command, I don’t know which restart file I should read since I used ‘restart 20000000 restart.$Q.1 restart.$Q.2’ to generate two restart files for each temperature.

Thanks and I really appreciate any potential response.

Kevin

You are making it quite difficult to help you.

For example, your quoted input is not easy to read since you didn’t quote it correctly. I have reminded you of that before. Please respect the guidelines of this forum category or you are likely to be ignored.

This is just stating that the batch system noticed that there was an error. It does not provide any information about what that error was and thus mostly useless.

Hi Dr. Akohlmey, I’m so sorry for my confusing quoted input.
I’m trying to restart a parallel tempering simulation, and my input file of that parallel tempering is

.
From this, for each replica, I generate two restart files as restart.$Q.1 and restart.$Q.2.
Then to restart this simulation, I defined a new world variable t as variable w world 7 3 28 37 30 35 33 26 11 18 36 24 2 1 15 4 9 16 8 19 0 6 14 32 17 27 34 39 29 38 5 23 25 31 20 22 40 10 21 13 12 and the rest of my input file is:

Since there are two restart files, may I ask which restart file I should read? (restart. $Q.1 or restart.$Q.2?)

And my new restart file doesn’t work with this error:

Could you please help me resolve this issue? I searched this error ‘MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 1.’ and I don’t know how to fix it.

for potential consideration, In the new restart simulation, I offered those restart.$Q.2 files, force field, input file, and my shell file, here is my shell file and it worked well with the previous normal parallel tempering simulation:

Thanks and I appreciate your response.

Using screenshots is not helping, one cannot cut-n-paste images. Why don’t you just follow the advice been given. It is also explained in the forum guidelines post.

The one that has the more recent timestamp, of course.

This is not the error message that matters as this is only for the “universe” MPI rank. The error happens within the individual partitions and thus you need to look at their log or screen files.
If those are empty, consider (for debugging purposes) to turn off I/O buffering by adding the -nb command line flag.

Hi Dr. Akohlmey, thanks for your kind reply. I will read the guidelines and follow the quote rule. And I have solved this issue. I should define the pair style after the line of reading restart files, as the mgp pair-style seems not to be stored in restart files.

1 Like

Thanks for reporting back.