Cluster analysis

system · November 18, 2012, 1:20pm

Hello everyone!
I need to do some in space cluster analysis (eirther agglomerative or divisive). Do you know if there is any tool or software to do that as a postprocessing step?
Thanks a lot,
Davide

Andrew_Jewett · November 18, 2012, 4:21pm

Not enough information.
Are you trying to cluster the positions of atoms, or the conformations
of entire molecules?

If you are clustering molecule conformations, are you trying to
cluster the conformations of one large molecule? (Do you want to use
an RMSD-type-metric or something else?) Or are you clustering docking
conformations between two or more molecules?

I don't think the pizza.py tools do clustering, at least not for
molecule conformations.

I don't know if this is helpful, but I found that VMD has a "cluster" plugin:
http://physiology.med.cornell.edu/faculty/hweinstein/vmdplugins/clustering/
(I don't why this plugin is not distributed with VMD.)
...You could use topotools to generate a PSF file for your system
(which VMD needs to view your trajectories), and then use VMD to load
your trajectory. Then I assume you could use this "cluster" plugin on
your trajectory? The topotools plugin comes with all recent versions
of VMD, and is very useful:
https://sites.google.com/site/akohlmey/software/topotools

If that does not work, you might need to save (or convert) your
trajectory to DCD (or even PDB) format and use the post-processing
tools available for these trajectory format. I could be wrong, but
LAMMPS does not yet have a massive ecosystem of programs to support
its formats the way GROMACS, AMBER, CHARMM, NAMD do. But the VMD
plugins should help a lot.

Cheers
Andrew

P.S. When you save a LAMMPS trajectory file, you can limit the file
to contain only the atoms you are interested in using for your
clustering analysis by. (To do this, provide the "dump" command a
group name which is not "all".)
See http://lammps.sandia.gov/doc/dump.html

system · November 18, 2012, 6:14pm

My need is to see if certain moleules aggregate with each other and what the size of the aggregate is. To do so, I thought using cluster analysis could be a good idea. Is there any other tool or calculation I can use to reach my purpose?
Thanks!

Andrew Jewett <[email protected]...> ha scritto:

Andrew_Jewett · November 18, 2012, 6:45pm

Thanks, that helps. I assume you want to do this for each frame in
the simulation. (eg periodically).
I can't think of a way without writing a short script to process each
frame of your dump file. I hope other people more knowledgeable will
not overlook your post because I responded to it.

Here is one approach: You could use "fix bond/create" and "fix
bond/break" to create a bond between atoms in different molecules
whenever those atoms get within a certain distance from each other.
You can select which kinds of atoms you want to use for this purpose.
http://lammps.sandia.gov/doc/fix_bond_create.html
http://lammps.sandia.gov/doc/fix_bond_break.html

It is not necessary that these bonds exert forces on the atoms. (You
could use a harmonic bond with force-constant k=0). These bonds
simply give you an easy way to keep track of which atoms are within a
certain distance from each other. Then at regular intervals, you
could print out the entire list of bonds this way:

http://lammps.sandia.gov/doc/compute_bond_local.html

Then you could write your own script to parse the dump file, figure
out which atoms are bonded together, and then figure out the size of
each cluster. This might not be the easiest way to do this. But this
approach should be fast (O(n) instead of O(n^2)). Either way, you
would have to write a little bit of code.

Incidentally, you can use "fix bond/create" and "fix bond/break" later
(even if you already ran your simulation), by using the "rerun"
command.
http://lammps.sandia.gov/doc/rerun.html

Andrew

P.S. I started to write a fix which will change the moleculeID numbers
of atoms belonging to a molecule when that molecule forms a "bond"
with another molecule, so that the two molecules have the same
moleculeID. Then I got side-tracked. It was meant to be used with
"fix bond/create" and "fix bond/break". If I get around to finishing
it, I'll post it here.

akohlmey · November 18, 2012, 7:07pm

Thanks, that helps. I assume you want to do this for each frame in
the simulation. (eg periodically).
I can't think of a way without writing a short script to process each
frame of your dump file. I hope other people more knowledgeable will
not overlook your post because I responded to it.

the analysis whether molecules form clusters
or not, is far from trivial. a simple distance based
analysis is likely to produce unexpected results.

i would suggest to have a look at a paper written
by some colleagues of mine, where they look at
micellization of coarse grained surfactants.

http://dx.doi.org/10.1021/ct2005193

Here is one approach: You could use "fix bond/create" and "fix
bond/break" to create a bond between atoms in different molecules
whenever those atoms get within a certain distance from each other.
You can select which kinds of atoms you want to use for this purpose.
http://lammps.sandia.gov/doc/fix_bond_create.html
http://lammps.sandia.gov/doc/fix_bond_break.html

It is not necessary that these bonds exert forces on the atoms. (You
could use a harmonic bond with force-constant k=0). These bonds
simply give you an easy way to keep track of which atoms are within a

warning, this will change the interactions, since the new bond
with have the non-bonded 1-2 interactions excluded. this is
very likely not desired.

certain distance from each other. Then at regular intervals, you
could print out the entire list of bonds this way:

http://lammps.sandia.gov/doc/compute_bond_local.html

Then you could write your own script to parse the dump file, figure
out which atoms are bonded together, and then figure out the size of
each cluster. This might not be the easiest way to do this. But this
approach should be fast (O(n) instead of O(n^2)). Either way, you
would have to write a little bit of code.

Incidentally, you can use "fix bond/create" and "fix bond/break" later
(even if you already ran your simulation), by using the "rerun"
command.
http://lammps.sandia.gov/doc/rerun.html

that would be the only "safe" way, but there is still
significant programming effort. at this point it would
be just as simple to program a little grid search as
well, and do the whole analysis in one go right off
the trajectory.

axel.