structure factor compute?

Hi, all. Has anyone written a compute to calculate the static structure factor S(q) on-the-fly? Obviously it’s possible to do this in postprocessing, but it would be nice to be able to calculate it on the fly and then average it vs. time, analogous to compute rdf.

Also – not a LAMMPS question per se but I bet someone on this list has dealt with this problem. Afaict calculating S(q) is an O(Natom^2) operation, since

S(q) = sum_{j>i}^{Natom} e^{iq.(r_j-r_i)}.

You can also get S(q) by Fourier-transforming g®, but an accurate calculation would again be O(N^2) since you would have to calculate g® over all atom pairs (i.e. not just those within the potential cutoff + skin as in compute rdf.)

Is there a way around this O(N^2) scaling? How do people typically calculate S(q) for large systems? I have some vague idea that FFTs are relevant here but can’t see any way around the need to include all atom pairs in the calculation of S(q).

Thanks,
Rob

Afaik there's no compute that does it. I turned this (http://code.google.com/p/dynsf/) up with a little Googling, but I have no idea if it's any good or not.

Alternatively, you could produce the RDF with LAMMPS and use a progamming language of your choice to pass it through an FFT library. We've used a similar method to turn time series into time correlation functions in our group, and it's relatively easy.

Niall

Niall Jackson
PhD Student, Bresme Group (Chemical Physics Section)
Department of Chemistry, Imperial College London

Keep in mind its a Fourier transform. You are only going to get info about very large wavenumbers if you were to perform an FFT on an rdf taken from lammps. You need the entire system to compute this accurately not just the atoms in the neighbor list.

Hi, all. Has anyone written a compute to calculate the static structure
factor S(q) on-the-fly? Obviously it's possible to do this in
postprocessing, but it would be nice to be able to calculate it on the fly
and then average it vs. time, analogous to compute rdf.

Also -- not a LAMMPS question per se but I bet someone on this list has
dealt with this problem. Afaict calculating S(q) is an O(Natom^2) operation,
since

S(q) = sum_{j>i}^{Natom} e^{iq.(r_j-r_i)}.

You can also get S(q) by Fourier-transforming g(r), but an accurate
calculation would again be O(N^2) since you would have to calculate g(r)
over all atom pairs (i.e. not just those within the potential cutoff + skin
as in compute rdf.)

Is there a way around this O(N^2) scaling? How do people typically

you can't get easily get around the O(N**2), but you can make it
*much* less painful using GPUs. a former colleague of mine has
implemented that for g(r) a couple years back into VMD.
http://dx.doi.org/10.1016/j.jcp.2011.01.048

also, i have some good friends here in italy, that do the similar
things for neutron diffraction data and GPUs help to make it feasible
and to avoid artifacts from the fourier transform, not to mention
allows them to work with quasi crystals and other "strange" objects.

calculate S(q) for large systems? I have some vague idea that FFTs are
relevant here but can't see any way around the need to include all atom
pairs in the calculation of S(q).

there are tricks with assembling the data in fourier space directly,
however, that will collide with the domain decomposition and require a
lot of additional communication.

axel.

Hi, everybody. Thanks! Yes, obviously one could use compute rdf and Fourier transform, but the short (pair + skin) cutoff inherent to this fix would mean (as noted) one could only get large wavenumbers, and might indeed produce “crappy” results given the nature of FTs.

I did look at dynsf. It looks nice. But I guess I should have been clearer about our situation. We already have a parallel code that does a nice job of calculating S(q) in postprocessing, but it’s utility is limited by the N^2 scaling. Axel’s point about using GPUs is a good suggestion - would that we had someone who knew CUDA! :slight_smile:

Anyway, our parallel code works by loading all the atoms (from LAMMPS configuration dumps) into common memory, then calculating S(\vec{q}) for different \vec{q} on different processors, and then gathering all the S(\vec{q}) to produce S(q). The disadvantage of this approach is that a lot of configuration dumps are necessary to get good statistics. So I was wondering whether it would be worthwhile to implement this kind of algorithm in LAMMPS to calculate and average S(q) on-the-fly as a compute. Once complication is (I think - correct me if I’m wrong) each processor would need to be able to access the position of each atom in order to get low wavenumbers. I’m not sure how easy this is to do within the current LAMMPS, i.e. do any fixes/computes gather all atom positions together onto each processor? Am I missing something?

Does anyone have a good idea of where to start for writing a compute to compute S(q) in parallel within LAMMPS? (directly, including all particle pairs). I.e. would this need to be done “from scratch”?

rob,

Hi, everybody. Thanks! Yes, obviously one could use compute rdf and
Fourier transform, but the short (pair + skin) cutoff inherent to this fix
would mean (as noted) one could only get large wavenumbers, and might indeed
produce "crappy" results given the nature of FTs.

I did look at dynsf. It looks nice. But I guess I should have been clearer
about our situation. We already have a parallel code that does a nice job
of calculating S(q) in postprocessing, but it's utility is limited by the
N^2 scaling. Axel's point about using GPUs is a good suggestion - would that
we had someone who knew CUDA! :slight_smile:

it is easier than you think, and in my personal experience spending
some time porting a code to GPUs (be it with CUDA, OpenCL or any other
current or future tool) is helping also to write better code for
(multi-core) CPUs (with vector units).

Anyway, our parallel code works by loading _all the atoms_ (from LAMMPS
configuration dumps) into common memory, then calculating S(\vec{q}) for
different \vec{q} on different processors, and then gathering all the
S(\vec{q}) to produce S(q). The disadvantage of this approach is that a lot
of configuration dumps are necessary to get good statistics. So I was
wondering whether it would be worthwhile to implement this kind of algorithm
in LAMMPS to calculate and average S(q) on-the-fly as a compute. Once

i don't think it is worth it. your point about the statistics is
missing one important component: statistical relevance. simply
sampling data more often does not necessarily improve statistics. the
individual samples also have to be _statistically relevant_,
preferably completely independent. however in MD they are not; that is
not when they are sampled in close succession. lets just consider the
two extreme cases:
1) you have long-range structural features in your system. that would
mean, that you need to extend the g(r) to long r but that would *also*
mean that you have strong correlation between time steps, and thus
more frequent collection of data, won't improve your results much, if
at all.
2) you don't have long-range structural features. that would make it
more likely you can benefit from on-the-fly analysis, but in this
case, you can use the existing rdf compute. please note, that you can
make long r rdf computations more affordable in LAMMPS by using the
"rerun" command to postprocess a previously recorded trajectory, set a
suitably large cutoff and otherwise turn the force computation off.

complication is (I think - correct me if I'm wrong) each processor would
need to be able to access the position of each atom in order to get low
wavenumbers. I'm not sure how easy this is to do within the current LAMMPS,
i.e. do any fixes/computes gather all atom positions together onto each
processor? Am I missing something?

no, you would have to do communication to collect position data into a
single processor. it is required for some dump file formats and
interfaces like USER-COLVARS (which runs in serial, but benefits from
the fact, that usually only a very small number of atoms participate
in a collective variable), for example and that can affect parallel
scaling.

Does anyone have a good idea of where to start for writing a compute to
compute S(q) in parallel within LAMMPS? (directly, including all particle
pairs). I.e. would this need to be done "from scratch"?

probably. the challenge is to parallelize it and to make it scale with
system size. to make it palatable, thus you would have to set up a
ring buffer scheme (for which code exists in LAMMPS as it is used by
shake and fix rigid/small, IIRC) and then basically have each
processor deal with all pairs involving its local atoms and then have
a ring of buffers with the local atoms of all other subdomains that
you circulate around taking care not to double count pairs. certainly
doable, but most certainly also a project for a persion that *does*
want to dig deeper into the intestines of LAMMPS.

in short:
- it would be more convenient and a useful addition to lammps' analysis features
- it would be quite a bit of programming/debugging/lammps-learning work
- it may not have the effect you are looking for

axel.

p.s.: if you can get a hand on a copy if the paper below (i don't
think i have one left, and there is no online access beyond 1998), you
can see, that i've spent some time on this kind of subject back in the
time when men were *real* men, computers were *real* computers and
little fluffy beings from alpha centauri were *real* little fluffy
beings form alpha centauri (and i had just started as a grad student).
Long-range structures in bulk water, A. Kohlmeyer, W. Witschel, E.
Spohr, Z. Naturforsch. 52a, 432-434, (1997). (Link)

To augment what Axel said, if you want to compute
something that requires each atom see all the other
atoms (i.e. N^2), you don’t want to collect all
atoms to every proc. That won’t fit in memory for
big problems. What LAMMPS does for this is circulate
the atoms around a “ring” of all the procs in P stages,
where P = # of procs. It’s not cheap (esp as P gets
big), but it’s a reasonable method. However LAMMPS
only does it currently for one-time setup operations
like computing the special bond topologies. You could
do it in a compute for S(q), but you’d want to warn
the user it’s expensive. It’s not hard to use this approach
since there is a Comm::ring() method that does
the communication and makes a callback (e.g. to
the compute) each time a processor gets a set
of atoms from another proc. So your compute
would get P callbacks to allow it to do the N^2 operations
in P chunks.

Steve