Communicate all atoms positions to current proc

Pablo_Alcain · April 20, 2014, 11:43am

Hi everyone,

I need to calculate Radial Distribution Function for as many atoms as possible, which would need to override the force and ghost atoms cutoff.

So far I’ve been doing this through an external function call with a code that calls LAMMPS as a library, but now I am trying to trivially modify the compute_rdf method to just do an O(N^2) search over all pairs.

This works out of the box with only one processor. However, for more than 1 proc, I need to access to every global atom position, beyond ghosts. Is there any function I am missing that I can use to access to these variables?

akohlmey · April 20, 2014, 12:51pm

Hi everyone,

I need to calculate Radial Distribution Function for as many atoms as
possible, which would need to override the force and ghost atoms cutoff.

So far I've been doing this through an external function call with a code
that calls LAMMPS as a library, but now I am trying to trivially modify the
compute_rdf method to just do an O(N^2) search over all pairs.

This works out of the box with only one processor. However, for more than 1
proc, I need to access to every global atom position, beyond ghosts. Is
there any function I am missing that I can use to access to these variables?

you have to do a communication to get those in one spot and for that
you need to provide suitable storage. have a look at the dcd and xtc
dump styles for how you can do it, since those need all atoms - sorted
by its tag - in one chunk on the rank 0 processor to write out a
trajectory frame.

axel.

sjplimp · April 22, 2014, 10:15pm

Or you could look at the Comm::ring() function which
will let each proc see all the data (one piece at a time)
owned by all other procs.

I will add that conceptually what you are doing is
inefficient in a parallel code. You’d probably be better
off post-processing an N^2 operation. Or you could
do it on dump snapshots with the rerun command
and defining a long cutoff in the rerun script.

Steve

Pablo_Alcain · April 30, 2014, 9:05pm

I see your point, but the problem with doing a rerun is that the dump files build up too much disk space, and I wanted to avoid that.

On the other hand, I understand why the code would be slow, but if i want to calculate the rdf of every pair of atoms [up to boxsize/2], is there any way I can avoid looping over all pairs?

thanks

sjplimp · May 1, 2014, 2:08pm

if i want to calculate the rdf of every pair of atoms [up to boxsize/2], is there any way I can avoid looping over all pairs?

How would you imagine you could do that?

If every pair contributes to the RDF histogram,
then at a minimum you have to consider the pair
and compute the distance between the 2 atoms to

know what bin it contributes to.

Steve

akohlmey · May 1, 2014, 2:24pm

if you want to do brute force RDF, you should try the GPU accelerated
implementation in VMD. this is one of the cases where GPUs can really
help a lot. check out http://dx.doi.org/10.1016/j.jcp.2011.01.048 for
technical details.

axel.

Pablo_Alcain · May 1, 2014, 9:55pm

OK, I misunderstood something about the “inefficiency” of the code. If I want to get as many atoms as possible, (up to r_max = boxsize / 2), I can’t think of anything more “efficient” than looping over all N * ( N - 1 ) / 2 pairs and then select those with distance less than the cutoff, that’s why I asked about any other way I can avoid looping over them all. What raised my question was the “inefficiency” of the method. I mean, I get the rdf calculation would be slow, but why is it inefficient? Do you mean it would be inefficient to add it to LAMMPS and it would be overall faster if postprocessing the same amount of frames?

akohlmey · May 2, 2014, 3:45am

OK, I misunderstood something about the "inefficiency" of the code. If I
want to get as many atoms as possible, (up to r_max = boxsize / 2), I can't
think of anything more "efficient" than looping over all N * ( N - 1 ) / 2
pairs and then select those with distance less than the cutoff, that's why I
asked about any other way I can avoid looping over them all. What raised my

no. this is an O(N**2) problem.

question was the "inefficiency" of the method. I mean, I get the rdf
calculation would be slow, but why is it inefficient? Do you mean it would
be inefficient to add it to LAMMPS and it would be overall faster if
postprocessing the same amount of frames?

the problem with LAMMPS is, that it doesn't make sense to use a neighbor list.
also, it doesn't make sense to apply a cutoff. just compute the
distance and fill a histogram.

*if* you want to do this inside of LAMMPS, then ideally, you don't
want to collect all atoms on one processor, but you would rather build
a buffer of size MAX(atom->nmax, over all MPI ranks) and initialize it
to the local atoms. then you compute all pairs between your local
atoms and the buffer and fill your local histogram. then you
communicate the buffer on a ring across all MPI ranks to the next MPI
rank and - again - histogram all pair distances and so on, until the
buffer has passed around all processors. then you reduce the histogram
and normalize it.
this way you run in parallel with MPI and save a lot of memory.

but since this calculation is so simple, it may be very efficient to
do with with multiple GPUs in post processing, if you have some
available.

axel.

sjplimp · May 2, 2014, 2:08pm

And to do what Axel described, you should use
the Comm::ring() function, which handles all the MPI
communication for shipping a buffer cyclically to
all procs. You just have to write the on-processor
loops to caclulate the RDF. Since the calculation
in N^2, the communication cost will not be significant,
and LAMMPS will do it in parallel, with a cost of
N^2/P. However for big N, that is still expensive.

Steve