dump to hdf5 database

https://gforge.accre.vanderbilt.edu/plugins/scmsvn/viewcvs.php/?root=lammpstools

Using the dump files reader I posted, I wrote a script that transfers that data into an hdf5 database. Download the two py files to the same directory. Again the dump file names should be sequenced like: 1.dump 2.dump 5.dump…

The simplest way to use it is to run it from the directory with the dump files: python dumps2hdf5.py. A temporary database will be generated to allow for resuming if the transfer process is interrupted. The final db name defaults to atoms.hdf5. Optionally, you can delete the dump files, change the source directory and the destination db file, change the dump file match pattern (from *.dump), and the number of timesteps read at one time.

You will need to have h5py installed (which requires numpy installed). My testing has been on python 2.6.

Performance is a barely acceptable ~1MB/sec. The limitation is coming from the text file processing but I don’t know how to make it faster and I don’t want to spend more time on it. I’ve already made an effort to optimize it. The db file size I got was about half the text file(s) size.

python dumps2hdf5.py --help for options.

Feedback is appreciated.

You will need to have h5py installed (which requires numpy installed). My testing has been on python 2.6.

Performance is a barely acceptable ~1MB/sec. The limitation is coming from the text file processing but I don’t know how to make it faster and I don’t want to spend more time on it. I’ve already made an effort to optimize it. The db file size I got was about half the text file(s) size.

please remind me. if you are concerned about reading speed, what is
the reason to first write to a text mode dump and then use python to
convert it to hdf. why not simply write a dump class that writes the
hdf file directly?

axel.

That would be ideal and I did mention it some time ago. But I don't have enough C experience to do that nor do I know enough about the inner workings of LAMMPS.

One could use the other binary outputs of the LAMMPS dumps command but they are not as flexible as the text dump files in terms of output data types.

Oh I just noticed my script isn't going to process triclinic boxes nor molecular topology.