LAMMPS BLAS Benchmark

Dear LAMMPS Users,

We are currently looking at the proportion of time that LAMMPS spends in BLAS, and are looking for some advice about which benchmark to use.

Ideally, we are looking for a benchmark that fits the following criteria:

  • Makes heavy use of BLAS+LAPACK.
  • Scales well to a high number of nodes.
  • A benchmark that is representative of the work done by the LAMMPS community, and ideally what they will want to do in future years on the largest scale supercomputers.

We have seen that the packages awpmd and atc directly use BLAS, are these commonly used by the community? Do these packages have to be enabled in order to use BLAS?

Any feedback, or advise on this would be graciously received.

Cheers,

Harry Waugh

High-Performance Computing PhD Student
University of Bristol

Dear LAMMPS Users,

We are currently looking at the proportion of time that LAMMPS spends in BLAS, and are looking for some advice about which benchmark to use.

sorry, but that number is in practice zero with a few exceptions.

Ideally, we are looking for a benchmark that fits the following criteria:

  • Makes heavy use of BLAS+LAPACK.
  • Scales well to a high number of nodes.
  • A benchmark that is representative of the work done by the LAMMPS community, and ideally what they will want to do in future years on the largest scale supercomputers.

We have seen that the packages awpmd and atc directly use BLAS, are these commonly used by the community?

no. and i doubt that the BLAS functions are time critical.

there are other packages that use some BLAS functions, LATTE or USER-PLUMED, but only the former (as a tight-binding code) is likely to make significant use of linear algebra, but i don’t think you can use it in LAMMPS with more than 1 MPI ranks currently.

the most time consuming steps in typical LAMMPS calculations are the computation of the non-bonded forces, the assembly of the neighbor lists and if used, the 3d FFT step during PPPM long-range coulomb. This is particularly problematic for large numbers of MPI ranks, as it can only be distributed in 2d (FFTs require continuous data in 1d) and requires a transpose with all-to-all communication to switch between dimensions. so for this part, the fraction of communication can quickly become dominant, not the computation.

Do these packages have to be enabled in order to use BLAS?

yes.

axel.