[lammps-users] Converting a lammps dump file to lammps data file

JoeZ · December 3, 2020, 3:37pm

Hi all,

Hope you are doing well.

I’m trying to convert my LAMMPS simulation files to LAMMPS data files so that I can modify the domain using the info from the last dump file and re-run it later. My domain is triclinic, and according to LAMMPS documentation, the output format of the boundary info is:

ITEM: BOX BOUNDS xy xz yz xx yy zz

xlo_bound xhi_bound xy_tilt
ylo_bound yhi_bound xz_tilt

zlo_bound zhi_bound yz_tilt

and when I rewrite these info for my data file as:

xlo_bound xhi_bound xlo xhi
ylo_bound yhi_bound ylo yhi
zlo_bound zhi_bound zlo zhi
xy_tilt xy_tilt xy_tilt xy xz yz

the box size changes and the potential energy of my entire domain rises a lot which blows away atoms.

Does anyone have an idea of what happened?

Sincerely,
Joe

akohlmey · December 3, 2020, 3:47pm

two things:

the boundaries written to the dump file are not the same as the lo/hi values in the data file. the math to convert this is a bit messy. it took me a while to figure it out so the molfile plugin for LAMMPS works correctly in VMD. thus the simple approach would be to just use VMD (which can write LAMMPS data files through the TopoTools plugin) and take it from there.
if your intention is to just continue simulations with the same system, but then use updated coordinates and box information from a dump file, you don’t need to write a new data file.
you can just load the original data file and then use “read_dump” to update the information you want to update from a dump file (provided the information you seek is included, but that is a requirement for your original method anyway).

in general, it is a often a very good idea to output (binary) restart files during a simulation and at the end of a simulation. they allow continuation of simulations and can also be easily converted into a data file (for further manipulations) with LAMMPS itself.

axel.

JoeZ · December 3, 2020, 4:18pm

Hi Axel,

Thanks for your reply. Appreciate your help!

Sincerely,
Joe

jewettaij · December 4, 2020, 1:31am

I hesitate to add to what Axel said, because in this particular case, the VMD plugin that Axel suggested will probably work best. However, more generally there are at least three other automatic ways to convert dump files into data files (and hopefully some of them work with triclinic dump files):

Take a look at the “read_dump” and “write_data” commands that come with LAMMPS.

The pizza.py tools can probably also convert dump file information into LAMMPS data files:

https://pizza.sandia.gov/doc/dump.html

https://pizza.sandia.gov/doc/data.html

You can also try the “dump2data.py” script. (You don’t need to download all the moltemplate tools. The script file is available by itself here.) When I wrote this program, I attempted to add support for triclinic boundary conditions. But I haven’t tested that feature yet, because I’m too lazy to generate triclinic simulation data needed for testing. (If you send me your dump file and a data file, I will give it a try.) Again, for this reason, perhaps it is better to use Axel’s VMD plugin instead, since he has debugged this feature. Either way, in case you are curious, here is an example how to use “dump2data.py”:

dump2data.py -t 10000 data_file < dump.lammpstrj > new_data_file

This will convert a snapshot at timestep “10000” from your dump file (“dump.lammpstrj”) into “new_data_file” (using the original “data_file” to fill in the information that is not present in the dump file, such as bond topology).

Caveats:
a) I think the “dump2data.py” script automatically unwraps the coordinates of the atoms in your simulation and throws away the image flags. This probably makes it useless for you if you plan to change the size of the boundary box.
b) The “dump2data.py” is a bit slow when reading large trajectories. To speed it up, use a text editor (or the unix “head” and “tail” commands) to extract the text corresponding to the frame from the trajectory that you want to convert (eg the frame at timestep 10000), save it as a separate file, and apply dump2data.py on that file (instead of the original “dump.lammpstrj” file).
c) The “dump2data.py” file copies the xlo, xhi, ylo, yhi, zlo, zhi, xy, xz, yz information directly from the “BOX BOUNDS” section of the dump file. As Axel pointed out, this might not be correct. (If you point me in the direction of documentation

Please disregard part “c)” of my post. I found the relevant documentation, and attempted to fix “dumpedata.py” according to this documentation. (But I confess I’m still too lazy to run a triclinic simulation to test dump2data.py. If somebody sends me a short triclinic dump file, I’d be happy to use it to test “dump2data.py”)

jewettaij · December 4, 2020, 1:24am

I hesitate to add to what Axel said, because in this particular case, the VMD plugin that Axel suggested will probably work best. However, more generally there are at least three other automatic ways to convert dump files into data files (and hopefully some of them work with triclinic dump files):

Take a look at the “read_dump” and “write_data” commands that come with LAMMPS.
The pizza.py tools can probably also convert dump file information into LAMMPS data files:

https://pizza.sandia.gov/doc/dump.html

https://pizza.sandia.gov/doc/data.html

You can also try the “dump2data.py” script. (You don’t need to download all the moltemplate tools. The script file is available by itself here.) When I wrote this program, I attempted to add support for triclinic boundary conditions. But I haven’t tested that feature yet, because I’m too lazy to generate triclinic simulation data needed for testing. (If you send me your dump file and a data file, I will give it a try.) Again, for this reason, perhaps it is better to use Axel’s VMD plugin instead, since he has debugged this feature. Either way, in case you are curious, here is an example how to use “dump2data.py”:

dump2data.py -t 10000 data_file < dump.lammpstrj > new_data_file

This will convert a snapshot at timestep “10000” from your dump file (“dump.lammpstrj”) into “new_data_file” (using the original “data_file” to fill in the information that is not present in the dump file, such as bond topology).

Caveats:
a) I think the “dump2data.py” script automatically unwraps the coordinates of the atoms in your simulation and throws away the image flags. This probably makes it useless for you if you plan to change the size of the boundary box.
b) The “dump2data.py” is a bit slow when reading large trajectories. To speed it up, use a text editor (or the unix “head” and “tail” commands) to extract the text corresponding to the frame from the trajectory that you want to convert (eg the frame at timestep 10000), save it as a separate file, and apply dump2data.py on that file (instead of the original “dump.lammpstrj” file).
c) The “dump2data.py” file copies the xlo, xhi, ylo, yhi, zlo, zhi, xy, xz, yz information directly from the “BOX BOUNDS” section of the dump file. As Axel pointed out, this might not be correct. (If you point me in the direction of documentation

d) If velocities are present in your dump file, I -think- that “dump2data.py” will copy them into your new data file, however I could be wrong about this. The same goes for non-point like particles, such as dipoles or ellipsoids. (I have probably only tested this script on trajectories that contain ordinary point-like atoms.)

This is probably not helpful for you (Joe Zhang), but perhaps it will be useful to somebody who stumbles into this post later on.
Cheers

Andrew

Hi Axel,

Thanks for your reply. Appreciate your help!

Sincerely,
Joe

two things:

the boundaries written to the dump file are not the same as the lo/hi values in the data file. the math to convert this is a bit messy. it took me a while to figure it out so the molfile plugin for LAMMPS works correctly in VMD. thus the simple approach would be to just use VMD (which can write LAMMPS data files through the TopoTools plugin) and take it from there.

if your intention is to just continue simulations with the same system, but then use updated coordinates and box information from a dump file, you don’t need to write a new data file.
you can just load the original data file and then use “read_dump” to update the information you want to update from a dump file (provided the information you seek is included, but that is a requirement for your original method anyway).

in general, it is a often a very good idea to output (binary) restart files during a simulation and at the end of a simulation. they allow continuation of simulations and can also be easily converted into a data file (for further manipulations) with LAMMPS itself

For what it’s worth, I’ve had huge problems using binary restart files in the past, but perhaps things have gotten better since those bad old days (7 years ago). Either way, the format of a LAMMPS restart file varies over time. So if you attempt to open a LAMMPS restart file that was saved years ago using the latest version of LAMMPS, you may be disappointed.

Cheers
Andrew