[lammps-users] Cluster statistics monitoring during colloidal aggregation simulation

Hello LAMMPSians,

I want to run parallel simulations of colloidal aggregation of initially uniformly distributed monodisperse spherical particles in a periodic 3D cell (preferably using an algebraic or tabulated potential, but without explicit solvent). I am interested in monitoring various properties related to aggregated clusters as a function of simulated time, e.g. cluster size distributions, intra-cluster connectivity, effective volume fraction of clusters, radius of gyration etc. I would like to have the ability to have the simulation run until some criteria, based on these quantities, are fulfilled.

I imagine that there are two basic approaches to this - implement a fix to calculate and dump the desired quantities periodically, or set up a restart-run-analyse loop using, e.g. LAMMPS, a shell script and a self-written program to calculate the desired properties. Getting efficient parallel execution in either of these strategies is, however, problematic. In my experience, the latter strategy does not seem to interact well with some job queueing systems - I have had similarly implemented programs, submitted to dedicated queues for multi-cpu jobs, keel over and die after only the first parallel LAMMPS run. The reason for this has been that the scheduler sees that a single parallel MPI program run has completed, and hence thinks that the entire job is complete.

I would be grateful for any advice, or pointers to prior code developed for such a purpose. I would prefer to avoid reinventing the wheel if at all possible, but am capable of code development myself (and might possibly be able to give some development work to students). I have a particular interest in exploitation of IBM Cell processors and GPUs, so any suggestions along those lines would be especially welcome.

Many thanks, as always, in advance for your collective wisdom!

Steve Kirk

If you are willing to write a fix and/or compute that
calculates your quantities of interest and exposes
them to the rest of LAMMPS in the standard way
(for output, averaging, etc - see section 4.15 of the
manual), then it is not hard to write a loop in
your input script that does a long run in small
chunks, with an "if" test in the loop that breaks out
when some condition is met. The condition could
involve evaluating a variable that queries your
computes, fix, etc. See the doc page for the "if"
command for a simple example.

The tricky part of your description might be calculating
quantities like cluster connectivity efficiently in parallel.
One rule of thumb is that if you can't do the calculation
"locally", meaning with neighbor list info for atoms
nearby a central atom, then it will be hard to do
in parallel.

Steve

Hello Steve (a Happy 2009 to you!),

If you are willing to write a fix and/or compute that
calculates your quantities of interest and exposes
them to the rest of LAMMPS in the standard way
(for output, averaging, etc - see section 4.15 of the
manual), then it is not hard to write a loop in
your input script that does a long run in small
chunks, with an "if" test in the loop that breaks out
when some condition is met. The condition could
involve evaluating a variable that queries your
computes, fix, etc. See the doc page for the "if"
command for a simple example.

Ok, I will look into the docs more closely. The internal loop construct and conditional look very useful in this application.

The tricky part of your description might be calculating
quantities like cluster connectivity efficiently in parallel.
One rule of thumb is that if you can't do the calculation
"locally", meaning with neighbor list info for atoms
nearby a central atom, then it will be hard to do
in parallel.

Yes, that's what I suspected. Finding nearby neighbours should be fast due to the preexisting neighbour lists. Assuming that membership of a specific cluster is determined by the centre-of-mass distance from the particle of interest to another particle already labelled as being in a specific cluster is less that some threshold, then one approach might be for all processors to independently calculate partial clusters within their problem subdomain, then merge this information, making sure to check for the possibility of clusters that span more than one subdomain. It seems that it's the merge operation that is the unavoidable non-parallel step, and unfortunately this will probably have to be done at every cluster evaluation, due to drifting of clusters during the simulation (I plan to use the Langevin thermostat with negligible viscosity in these simulations).