[lammps-users] More than 32 groups

_Anthony_Costa · September 22, 2010, 11:02pm

Does anyone know the origin of this restriction?

I am in an odd situation where I truly do need more than 32 groups
(deleting and recreating won't do here). So, I simply incremented the
MAX_GROUP preprocessor definition in group.cpp and recompiled, noting
that everything in group.cpp seemed general enough to handle an
arbitrary number of groups. However, things get screwy in the groups
beyond number 32. For example, I specify 3 atoms for the 33rd group
and the code believes the group contains every atom in my system.

Amazingly though, the calculation of the property I am interested in
is actually correct in the 33rd group for the total system, that's
just what I specified! The program just seems to be confused about who
belongs to the group beyond 32.

Any thoughts would be welcome,

Anthony

akohlmey · September 22, 2010, 11:19pm

Does anyone know the origin of this restriction?

yes. the groups are implemented as bitmasks on integers
that are 32-bit on all but a few current systems.

I am in an odd situation where I truly do need more than 32 groups
(deleting and recreating won't do here). So, I simply incremented the

this question has come up many times and so far every request
for a larger number of groups could be resolved with a different way.
based on that history, i would like to cast a doubt on your statement
on truely needing more groups.

please explain what you want to do and then perhaps there is a
different way to do the same thing.

MAX_GROUP preprocessor definition in group.cpp and recompiled, noting
that everything in group.cpp seemed general enough to handle an
arbitrary number of groups. However, things get screwy in the groups
beyond number 32. For example, I specify 3 atoms for the 33rd group
and the code believes the group contains every atom in my system.

Amazingly though, the calculation of the property I am interested in
is actually correct in the 33rd group for the total system, that's
just what I specified! The program just seems to be confused about who
belongs to the group beyond 32.

you should never do modifications without knowing what the implications are
and then be surprised that it doesn't work. as i wrote above. the bitmask
arithmetic will fail on groups beyond index 31. your group 32 (the 33rd group)
will have the same bitmask as group 0 which is predefined to "all" atoms..

cheers,
axel.

_Anthony_Costa · September 22, 2010, 11:57pm

yes. the groups are implemented as bitmasks on integers
that are 32-bit on all but a few current systems.

Makes sense, thanks.

this question has come up many times and so far every request
for a larger number of groups could be resolved with a different way.
based on that history, i would like to cast a doubt on your statement
on truely needing more groups.

Great - perhaps my searching was insufficient, though I didn't see
something that fit my need. My situation is incredibly simple: I need
output of computes for each group for every time step. I specify
groups on a per-atom basis. Works great for N<32 groups, a previous
example of mine for the PE and KE is:

[...]
group grp101 id 1 2 3 4 5
group grp102 id 6 7 8 9 10
group grp103 id 11 12 13 14 15
[...]
compute 1 grp101 ke
compute 2 grp101 pe/atom
compute 3 grp101 reduce sum c_2
compute 4 grp102 ke
compute 5 grp102 pe/atom
compute 6 grp102 reduce sum c_5
compute 7 grp103 ke
compute 8 grp103 pe/atom
[...]
thermo 1
thermo_style custom c_1 c_3 c_6 c_9 [...]

you should never do modifications without knowing what the implications are
and then be surprised that it doesn't work.

Certainly, and I didn't actually expect it to work. But, as it took
less time to do this than it did to write this email, I didn't much
see the harm. Had it seemed to work, I would have verified!

Best,
Anthony

akohlmey · September 23, 2010, 12:39am

anthony,

you only have described what you are doing, but not what
you want to achieve. why do you need this information in
every step? this is highly inefficient and will kill particularly
parallel performance.

that all being said, i would assume that the best way to
approach what you are currently doing, is to do this in
postprocessing (remember that the data files will be huuuuggge).
you can compute ke/atom and pe/atom and then output this
as per-atom property in the dump file and then write a simple
program to aggregate that info according to your group definitions.

if i assume that those numbers are just going to be input to some
other calculation, then a better approach would be to write a
custom single compute that would operate on atom group all
and then had the various group definitions (molecules?) as input
and then computes and collects the desired information as needed.
that would take some more effort, but would avoid the costly
writing and reading of text files.

cheers,
axel.

_Anthony_Costa · September 23, 2010, 1:18am

you only have described what you are doing, but not what
you want to achieve. why do you need this information in
every step? this is highly inefficient and will kill particularly
parallel performance.

I think we are confused a bit here. My question was a completely
technical one regarding the feasibility of outputting > 32 group
computes during the computation of a trajectory at every time step. My
understanding of these forums are that their exact purpose is on
technical issues regarding the use of the software, and not on the
science. I am fully aware of the computational requirements for
dealing with such data and have the means to deal with them, both in
storage and performance. My scientific reasons for wanting this data
are irrelevant to the current question.

That being said, let me rephrase. Is what I'm asking possible? Is it,
perhaps, possible when I don't want every time step, but every time
step mod N where N > 1? For my purposes I can deal with situations
where N > 1.

Otherwise, on to your next point ...

that all being said, i would assume that the best way to
approach what you are currently doing, is to do this in
postprocessing (remember that the data files will be huuuuggge).
you can compute ke/atom and pe/atom and then output this
as per-atom property in the dump file and then write a simple
program to aggregate that info according to your group definitions.

This is a reasonable suggestion but it is not related to my question,
and you're right that these files would be huge. I could deal with
them, yes, but I'd much rather not if it is technically possible
within memory in the calculation, and you suggest a method I'm not
familiar with in your next point ...

if i assume that those numbers are just going to be input to some
other calculation, then a better approach would be to write a
custom single compute that would operate on atom group all
and then had the various group definitions (molecules?) as input
and then computes and collects the desired information as needed.
that would take some more effort, but would avoid the costly
writing and reading of text files.

This seems like an excellent suggestion and is exactly in line with my
original question. Yes they are molecules. I have absolutely no idea
how to achieve this. Any reference to previous list items would be
very welcome.

Also, you mentioned that "the groups are implemented as bitmasks on
integers that are 32-bit on all but a few current systems." For what
systems are they implemented differently? I have many computational
options.

Thanks again for your time, and all the best,
Anthony

sjplimp · September 23, 2010, 1:08pm

In several cases, LAMMPS commands get around
the limitation on groups by using the molecule ID
instead (which is assigned by you and need not
have anything to do with physical molecules). See
the compute com/molecule or compute msd/molecule
for example. These create a vector of values, which
can then be processed (e.g. time averaged, summed, output)
by many other LAMMPS commands.

So you could write your own compute foo/molecule,
using one of those as a template, and calculate whatever
you want on the atoms in each molecule to create
a per-molecule quantity. If what you want is the result
of some other per-atom compute (e.g. pe or ke), summed
by molecule, then this might be a general capability of
interest to general users.

Steve

akohlmey · September 23, 2010, 3:54pm

I think we are confused a bit here. My question was a completely
technical one regarding the feasibility of outputting > 32 group
computes during the computation of a trajectory at every time step. My
understanding of these forums are that their exact purpose is on
technical issues regarding the use of the software, and not on the
science. I am fully aware of the computational requirements for
dealing with such data and have the means to deal with them, both in
storage and performance. My scientific reasons for wanting this data
are irrelevant to the current question.

anthony,

sorry, but here we have a difference in understanding.

i already told you that what you were asking for is
practically impossible with the current LAMMPS code
base.

if you want help beyond that you have to "pay the price"
of sharing your intentions and "i want to do it" doesn't
cut it for me. i volunteer my time to help people, because
understanding their needs helps me (and others) to write
better code and anticipate future needs better.

you are certainly free to not share any details of your
scientific motivation, but then i will execute the equivalent
freedom to just ignore your request.

open source is a "quid pro quo" deal.

i am willing to help people w/o that condition only
if i get compensated for my time.

That being said, let me rephrase. Is what I'm asking possible? Is it,
perhaps, possible when I don't want every time step, but every time
step mod N where N > 1? For my purposes I can deal with situations
where N > 1.

this has nothing to do with time steps.

Otherwise, on to your next point ...

that all being said, i would assume that the best way to
approach what you are currently doing, is to do this in
postprocessing (remember that the data files will be huuuuggge).
you can compute ke/atom and pe/atom and then output this
as per-atom property in the dump file and then write a simple
program to aggregate that info according to your group definitions.

This is a reasonable suggestion but it is not related to my question,
and you're right that these files would be huge. I could deal with
them, yes, but I'd much rather not if it is technically possible
within memory in the calculation, and you suggest a method I'm not
familiar with in your next point ...

what you are asking for would create files of almost the same size.

if i assume that those numbers are just going to be input to some
other calculation, then a better approach would be to write a
custom single compute that would operate on atom group all
and then had the various group definitions (molecules?) as input
and then computes and collects the desired information as needed.
that would take some more effort, but would avoid the costly
writing and reading of text files.

This seems like an excellent suggestion and is exactly in line with my
original question. Yes they are molecules. I have absolutely no idea
how to achieve this. Any reference to previous list items would be
very welcome.

the LAMMPS source code is your reference. there are per
molecule computes that could serve as an example.

Also, you mentioned that "the groups are implemented as bitmasks on
integers that are 32-bit on all but a few current systems." For what
systems are they implemented differently? I have many computational
options.

you would have to have a platform for which by
the following statement is true: sizeof(int) == 8
you can trigger this on several setups through a compiler
flag, but then you would need a matching MPI installation,
and on top of that you would have to carefully debug
the LAMMPS code, because i am certain that there may
be places left where sizeof(int) is assumed to be no larger
than 4. if you read through the source code this should
become apparent very quickly.

cheers,
axel.

_Anthony_Costa · September 23, 2010, 4:16pm

i already told you that what you were asking for is
practically impossible with the current LAMMPS code
base.

Sorry I didn't get this from your previous email, I instead only read
it as "this is the right way to do it depending on the science you
want".

if you want help beyond that you have to "pay the price"
of sharing your intentions and "i want to do it" doesn't
cut it for me. i volunteer my time to help people, because
understanding their needs helps me (and others) to write
better code and anticipate future needs better.

Fair enough.

I am simply writing the new computes. I already have ke/molecule done
and pe shouldn't be too much more difficult. I will of course put them
up on the mailing list when they are done and their accuracy verified.

Best,
Anthony

akohlmey · September 23, 2010, 4:28pm

if you want help beyond that you have to "pay the price"
of sharing your intentions and "i want to do it" doesn't
cut it for me. i volunteer my time to help people, because
understanding their needs helps me (and others) to write
better code and anticipate future needs better.

Fair enough.

I am simply writing the new computes. I already have ke/molecule done
and pe shouldn't be too much more difficult. I will of course put them
up on the mailing list when they are done and their accuracy verified.

excellent! i am certain that there are other (current and future)
LAMMPS users that would appreciate such a contribution.
if you browse through the existing sources, you will see that
quite a bit of the code in there has been contributed by users
like you, that needed a specific problem fixed.

thanks,
axel.

_Zhun-Yong_ONG · September 23, 2010, 4:45pm

I think someone might have actually written the code for the
pe/molecule and/or ke/molecule compute. Just ask around.

Zhun-Yong

_Anthony_Costa · September 23, 2010, 6:01pm

Well I've already done ke/molecule, though pe/molecule is a bit more
complex it turns out. So, if someone has done this, let me know
please!

Anthony