Optimising efficiency for large system optimisation

Dear GULP developers and users,

I’m trying to perform optimisation and consequently calculate the phonons of a very large system, i.e. ~50.000 atoms. (It is so large because I’m investigating a disordered supercell of a molecular crystal.) I know such a calculation might well be near the limit of what is numerically feasible. However, by optimising my code I would like to test how large I can go.

As I understand it, GULP’s optimisation process is a balancing act between memory usage and computing time. I’m using the limited-memory BFGS algorithm, since loading the full Hessian causes my calculation to crash (if I’m correct this must be due to memory limits). I’m running my calculation in parallel on a HPC. I am currently running some tests to optimise two parameters:

  • number of cores: the more cores, the faster the calculation (at least for a medium number of cores - it might taper off). However, the more cores I request, the longer my queuing time in the hpc.
  • lbfgs order: the larger, the more memory is required, but the faster the convergence.

For the phonon calculation (full eigenvectors at Gamma point), as far as I know there’s no way to make it use less memory. But luckily by requesting a large number of cores I was able to run this calculation. So it is just the optimisation step I’m stuck at at the moment.

My question: I’d happily do the above described tests myself, but before I waste time on this: has anyone tried optimising such large systems before? Does anyone know what the quantitative relation is between system size, lbfgs order, memory usage and convergence speed? And are there any data available on GULP’s parallel performance in terms of speed versus number of cores (especially: when does the speed taper off?)?

Finally, any other suggestions or comments on how to solve this problem are also very welcome:)



Hi Bernet

There are a lot of big questions there & so it’s hard to answer in detail. The first point to suggest is that perhaps you want to optimise with conjugate gradients as this tends to be a bit more robust for large systems in GULP. Of course, it’s going to take a very large number of steps since this is usually proportional to the number of degrees of freedom to be optimised. Near the end, when everything is largely in the quadratic regime, then it might be worth looking at a few steps with Newton-Raphson (given you’re going to do the phonons anyway). Parallel scaling depends on many things which are specific to your system and the computer you are using, so you’ll need to do these tests yourself. If you have charges then the important thing is to optimise the cost of a gradient calculation with respect to the real and reciprocal space cutoffs. The easy way to do this is to specify the real space cutoff for the Ewald sum (using the ewaldrealradius option). By adjusting this you can get large speed ups in some cases. You can also look at the “spatial” keyword and trying using domain decomposition (again optimising the domain size). Of course you should also think hard about whether you really need the full eigenvectors for 50,000 atoms…

Hi Julian,

Thank you for these tips, this is very helpful. I will try out the conjugate gradients algorithm. Thanks also for pointing me to the spatial decomposition option - I’m hopeful that will make a big difference in my system since I have lots of short-range interactions (and luckily no long-range interactions).