[lammps-users] MDAnalysis (slightly OT)

Craig_Maloney1 · May 27, 2008, 3:53pm

Hi all.

I’d like to do some post-processing which involves, e.g., assembling the full trajectory for an arbitrary particle in my system. I keep the dumped output of my entire MD run in a single dump file which is too big to fit into system memory. With ascii-based output (e.g. LAMMPS default or custom dump), assembling a single-particle trajectory is a slow operation, since the code needs to read line-by-line, so the run time scales like numAtoms*numTimesteps even if I just want to look at the first atom. I figured there would be some binary based format, e.g. DCD, which would be much more amenable to this kind of post-processing. I’d like to work in python if possible but that’s not strictly necessary.

There is MDTools (http://www.ks.uiuc.edu/Development/MDTools/Python/), but it looks like it can only read all particles for a single given DCD frame, rather than tracking a single particle for all frames. I might be wrong, but this is the impression I get when glancing at the docs. Am I right?

I just came across MDAnalysis: http://code.google.com/p/mdanalysis/
Anyone have any experience with it? Will it assembly single-particle trajectories from a DCD file (or other binary format) in reasonable amounts of time (order numTimesteps rather than order numAtoms*numTimesteps)?

Thanks,
Craig

_Naveen_Michaud-Agra · May 27, 2008, 5:49pm

Hi Craig,

If you look at the MDAnalysis docs (http://code.google.com/p/mdanalysis/wiki/Documentation) and go down to the timeseries section you’ll see that what you want is possible (the timeseries code is inspired by Charmm’s capabilities, so you can almost do all the same analyses). Since each timestep in a DCD file is fixed in size, it is possible to extract a timeseries of a subset of atoms in order numTimesteps - as long as the overall timeseries can fit in memory, since it’s preallocated as a numpy array)

For example (assuming you have a psf and dcd file generated):

from MDAnalysis import *

universe = AtomGroup.Universe("psf file", "dcd file")

atom_1_5 = universe.selectAtoms("bynum 1:5") # selects the first five atoms - see the selection stuff in the docs (based on Charmms selection capabilities)
coordinates = universe.dcd.timeseries(atom_1_5, start=0, stop=-1, skip=1, format="afc")  # format = [a]tom, [f]rame, [c]oordinate - you can use all 6 permutations depending on what you want to do

and coordinates is just a 5 x N x 3 numpy array, where N is the number of timesteps in the trajectory. Is this what you are looking for?

Naveen Michaud-Agrawal

akohlmey · May 27, 2008, 5:54pm

craig,

please don't crosspost to lists where the bisection of subscribers is small.

Hi all.
I'd like to do some post-processing which involves, e.g., assembling the
full trajectory for an arbitrary particle in my system. I keep the dumped
output of my entire MD run in a single dump file which is too big to fit
into system memory. With ascii-based output (e.g. LAMMPS default or custom
dump), assembling a single-particle trajectory is a slow operation, since
the code needs to read line-by-line, so the run time scales like
numAtoms*numTimesteps even if I just want to look at the first atom. I
figured there would be some binary based format, e.g. DCD, which would be
much more amenable to this kind of post-processing. I'd like to work in

using VMD and a tool/script called bigdcd is probably the way to go.

you only load the trajectory frame by frame and VMD can handle .dcd and .xtc
binary trajectories (no LAMMPS binary, yet).

i also have a hack that allows writing .dcd files of a LAMMPS group
and thus would allow this kind of processing during the LAMMPS run
(unless you need a more complicated way to select the atom).

cheers,
axel.

sjplimp · May 28, 2008, 1:37pm

If what you want to do is extract one atom's coords from each snapshot
of a dump file, you could probably do a grep command with the
correct regular expression (using the atom ID) and grab them very quickly.

The Pizza.py package would also let you do this, one snapshot at a time,
but it would still have to read the whole file and be much slower than grep.

Steve

Craig_Maloney1 · May 28, 2008, 1:39pm

Hi Naveen.

This is exactly what I’d like to do! A single particle trajectory should fit nicely into physical memory. I’ll need to figure out what a psf file is. For some reason I’m having trouble installing numpy on my workstation, so I can’t check it right this second, but I’ll let you know if it works.

Thanks much,
Craig