Parallel Tempering Simulations trajectory post-processing

Aditya_Ranganathan · February 4, 2016, 6:26am

Dear All,

I have been performing some replica exchange (parallel tempering) simulations using LAMMPS. I`m interested in the ‘configurations’ at one particular temperature. As I understand, any particular output trajectory (output.$t.lammpstrj) would contain configurations from varying temperatures. However, I need to analyze the configurations at my temperature of interest only and would like a ‘trajectory’ consistent in the temperature space.

I have read through previous mailing list conversations on the same. However, I want to confirm if things are still the same even in the newer builds of LAMMPS. Secondly, are there standard post-processing scripts out there to do the same.

Thanks and Regards

Srivastav Ranganathan

PhD Student

IIT Bombay

Aditya_Ranganathan · February 4, 2016, 10:43am

I have used awk to split my trajectories and then concatenated the snapshots at the same temperature. The confusion was due to some earlier message in the mailing list.

I would suggest this to be included in the LAMMPS documentation clearly stating that one would have to post process the trajectories in order to get a canonical trajectory. I would try and contribute a writeup and a couple of awk scripts for the post processing.

Regards,

Srivastav R
PhD student
IIT Bombay

sjplimp · February 4, 2016, 3:55pm

I agree the doc page should explain this. How does this sound

below? Note that when you say this:

would like a ‘trajectory’ consistent in the temperature space. or
one would have to post process the trajectories in order to get a canonical trajectory.

I’m not sure what you mean. The dump file produced by each replica

is a continuous trajectory, just one where the temperature varies.

You can post-process to gather all the snaphots (one per time increment) at

300K. But it is no longer a “continuous trajectory” from one time frame to the

next.

Steve

Aditya_Ranganathan · February 4, 2016, 4:10pm

What I wanted to say was that to analyze the configurations sampled at a given temperature, one would need snapshots that are at the same temperature. Im sorry if it sounded otherwise. I suggested adding this to the documentation because I believe that most people would want to study the conformational space accessed by the system at some given temperature.

The documentation that you suggested looks clear enough and I hope it would put the confusion to rest. I could send a standard awk scripts along with a C program to do the job. Could it be added to the documentation page?
I`m sure it would be useful for a lot of people.

Regards

sjplimp · February 5, 2016, 3:25pm

If you have a script that does this, we could release it in the tools dir.
I assume it would take 2 inputs: the set of per-replica dump files and
the master log file which indicates which replica owned each temperature
at different times. And take an additional param that specifies
the name of the output files (one per temperature).

Something like:

script log.master dump.replica.* “newdump.temp.*”
where the last arg is quoted since those files don’t exist.
The * is replaced by the numbers 0 to Nreplica-1.

Note that the script would need to have logic that
accounts for the fact that the snapshot time increment
in the dump may be < or = or > than the replica exchange
increment in the master log file.

Steve

Aditya_Ranganathan · February 5, 2016, 7:04pm

I do the post-processing the following way. Might sound a little stupid and possibly be easy to do via a simpler (also smarter) script.

i) Lets suppose that I have 5 different replicas and hence 5 trajectories (dump0.lammpstraj, dump1.lammpstraj, dump2.lammpstraj etc). I use an awk script to write down one file per timepoint (per replica trajectory). time1.dump0.lammpstraj, time2.dump0.lammpstraj … timeN.dump0.lammpstraj and similarly for all the 5 replicas. So, at every timepoint (depending on the frequencies with which the trajectories are written) you have a snapshot from each replica.

ii) I use a simple C program to read through the master log file and identify the replica that contains the snapshot with the temperature of interest at every time step. I write down my snapshots with the same frequency as the exchange. The C code thus is trivial. However, one can write a generic version to include the difference in exchange and trajectory write frequencies using the same code (too lazy to do that at the moment ;-)). The C program can output a list of filenames that need to be concatenated (ascending order of time).

The output of the C program would look like: “time0.dump0.lammpstraj, time2000.dump0.lammpstraj, time4000.dump1.lammpstraj, time6000.dump4.lammpstraj … timeN.dumpN.lammpstraj”

iii) One just has to simply use the cat command over all the files listed in the output from step 2. { xargs cat < step2.out ; } > 310K_replica.lammpstrj

I basically have 2 scripts (one awk and a C program to process the master log). The scripts are in the crudest form and got my job done. People can use them and make them smarter and more generic to suit their needs. If it sounds good, I will attach the scripts in this thread.

Thanks and Regards

Srivastav Ranganathan
PhD Student

IIT Bombay,
Powai,
Mumbai, 400076