molecular data format

_Jeremy_A_Templeton · October 31, 2011, 3:48pm

Hi all, I want to do implement a fix that analyzes molecules. However, in going through the code and on-line documentation I couldn't find a few things I need. First, how do I determine the total number of molecules, including the local and total numbers? Do molecules have types and/or groups associated with them, i.e., is there an equivalent of the mask variable for them? Finally, are all the atoms of a molecule guaranteed to be on the same processor, or might they be distributed? I was a bit confused on this last point because the molecule tag for the atom is not included in pack_comm, but is included in pack_border.
Thanks,
Jeremy

_Ahmed_E_Ismail1 · October 31, 2011, 4:03pm

There’s no way to guarantee that a molecule will be stored entirely on one processor, unless its radius is smaller than the skin distance. Large chain molecules are almost guaranteed to be split between multiple processors.

Molecules don’t have “types,” although they do have ID’s by which atoms can be assigned to molecules. You can, however, define a group according to molecular ID’s. See the “group” command doc page for more information on the last point.

–AEI

_Jeremy_A_Templeton · October 31, 2011, 4:16pm

Thanks Ahmed. One follow up then:

There’s no way to guarantee that a molecule will be stored entirely on one processor, unless its radius is smaller than the skin distance. Large chain molecules are almost guaranteed to be split between multiple processors.

If this is the case, why is the per-atom molecule tag not communicated in pack_comm? Isn’t this needed so that molecules with atoms on different processors can keep track of each other?
Jeremy

_Ahmed_E_Ismail · October 31, 2011, 4:29pm

Steve would need to comment on this to be definitive, but my guess is that there isn’t really anything currently in the dynamics that requires atoms to know that another atom is part of molecule X or part of molecule Y. Anything that would require that kind of information probably gets it from another available source (by membership in a group or belonging to a neighbor list, for instance).

–AEI

_Christian_Muller · October 31, 2011, 5:01pm

Hi

the reason is that the molecule tag (as well as other things like mass, type, tag and so on) are not communicated with pack_comm is that they dont change over time. So you only need to transfer those values at timesteps where atoms are actually exchanged between processors (these are the same timesteps in which neighbor lists are rebuild). For the local atoms this is done by pack_exchange for the ghost atoms by pack_border. Hence, as you said, the molecule tag is only send with those.

The pack_comm is part of the routines updating the positions of ghost atoms on neighboring processors. These positions must be updated every timestep.

Cheers
Christian

-------- Original-Nachricht --------

_Jeremy_A_Templeton · November 1, 2011, 4:57am

Ok, thanks. One bit didn't get answered, and that's how to determine how many molecules there are, both locally (to the extent that term applies) and globally. Thanks for all your responses.
Jeremy

_Christian_Muller · November 1, 2011, 8:20am

Hi

I am 99% sure there is no variable stored anywhere giving the number of molecules either globally or locally. The molecule tag is just that, a tag. Molecules are more like pre defined groups of atoms, and not itentities in itself. No functions of LAMMPS work specifically with molecules as far as I know.

But if you write your own fix you can just scan through all atoms and count the number of atoms in each molecule (i.e. with the same molecule tag), how many different molecule tags are present on a processor, and how many exist globally (need some MPI for thath though). You'd want to do thath in the pre_neighbor() routine of a fix (thats only called in timesteps where reneighboring and atom exchange betweeen processors happen).

Cheers
Christian

sjplimp · November 1, 2011, 1:54pm

There are several computes that do per-molecule calculations. They
call compute::molecules_in_group() to get a global count of
molecules. Local # of molecules isn't well defined, since molecules
straddle processors. Such computes produce a global list
of values per molecule. I wouldn't use a fix for this unless there
is some persistence across timesteps needed. Even then, a compute
may be the better choice. E.g. compute msd uses a fix in the background
to simple store the original coords.

Steve

_Jeremy_A_Templeton · November 1, 2011, 3:10pm

Ok, I think I've mostly got it then. One last question: is there a way to get the number of atoms per-molecule? That will make coding a lot easier.
Jeremy

ps Steve, this is for atc, so it's definitely going to be a fix

akohlmey · November 1, 2011, 3:43pm

jeremy,

Ok, I think I've mostly got it then. One last question: is there a way to get the number of atoms per-molecule? That will make coding a lot easier.

that would assume that all molecules will be the same.
in that case it is simply atom->natoms/<number of molecules>

one subtlety that i don't think was pointed out properly
in this discussion is that there is no "molecule type"
property (it might be worth to add this). the molecule
id just groups atoms. in input converted from bio-simulations
those "molecules" may actually be residues, i.e. parts
of a molecule. i am not certain, but i don't recall seeing
a test for what is in the molecule tag.

you may have to deal with mixtures of molecules or
molecules and atoms. it can get pretty messy,
if you want to take care of any possible case. the
data that LAMMPS provides internally may not
be sufficient.

one way of handling this kind of situation is what
is done in VMD. there you have three sets of properties:
1) data provided by the user and that can be and is tested
  for consistency
2) data provided by the user that is not tested but just
  treated as an opaque property that can be used in
  VMD scripting applications
3) data computed by VMD itself based on its own set
  of assumptions and "rules"

for example VMD has a "resid" which is whatever
the input data provides and falls into category 2)
and a "residue" (number) property that is computed
by VMD. the latter is guaranteed to be a unique
residue identifier, "resid" is not. scripts that use
the former often get confused, since there are
often multiple residues with the same resid.

similarly, VMD computes a "fragment" property
(since the term "molecule" was already taken
- for historical reasons - to describe an entire
molecular data set) and gives each entity that
is connected via bonds a unique fragment number,
which is the cleanest way in VMD to identify a
molecule.

what does this mean for LAMMPS?
if you need to properly identify molecules,
you will likely need to compute and maintain
the relevant properties for yourself, since i
don't believe that what LAMMPS currently
provides is sufficiently self consistent to
uniquely identify molecules and molecule
types. from looking at what the granular
and peridynamics pair styles do, you can
probably do that from within a fix. i would
use that to maintain a list of additional
per atom properties formatted as needed.
i would first compile a list of "fragments",
i.e. determine atom clusters based on
their bonding patterns (this may need to
be extended for use with reactive potentials)
and then you can identify molecule types
by comparing fragments based on atom
types and bond/angle/dihedral/etc types.
whatever has the same "type" patters
would be one type of molecule. that may
be inconvenient at times, but it will be
much more flexible than having to do what
is done for TIP4P water potentials, where
you have restrictions on how you can
specify your input.

hope that helps and didn't confuse too much,
axel.