FFTW MPI Compilation Question

Hi all,
I had some problems compiling LAMMPS to use the FFTW3 library; using cmake I can get the code to report the FFTW library being used is FFTW but using make and compiling the old way, the compiled binary reports KISS.

  1. Does cmake alone update the reported library/will make always report KISS?
  2. For FFTW3, should we be using the mpi version of the library? -lfftw3_mpi vs -lfftw3 The official documentation makes no reference to fftw3_mpi and so I want to confirm that we should not be compiling the version of FFTW with mpi support.
  1. FFTW can be used with both build systems, but only CMake will auto-detect it. For GNU make you need to edit your machine makefiles. This is explained in detail in the manual.
  2. LAMMPS has its own 3d FFT parallization and thus only uses serial 1d FFTs

Apologies; I did leave out that I edited the FFT path and library in MAKE/Makefile.mpi to point to the FFTW3 installation as detailed in the manual. The proper question is “Will the output properly reflect the change from KISS to FFTW3 using the traditional make or is this functionality limited to cmake?” This is not explicitly stated on the FFT Build page.

Thanks for clarifying the 3d FFT parallelization part. Can the docs be updated to specifically call out that users should not try linking to fftw3_mpi?

Paths and library are not sufficient. The define is needed as well. Also, it has to be done with the correct syntax.

Do you really think we would write a software that says one thing and then does something else?

If we would write down all the things that you should NOT do the manual would be 10x as large. It works the other way around: the manual states what you need to do, if there is no mention for an MPI parallel FFTW then there is no need. You can link it if you want, but it won’t have any effect.

BTW: the performance differential between FFTW and KISSFFT is negligible for most use cases.

here are numbers for the rhodo benchmarks on an Intel(R) Core™ i5-10210U CPU

with FFTW3:

Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 16.137     | 16.137     | 16.137     |   0.0 | 75.23
Bond    | 0.66065    | 0.66065    | 0.66065    |   0.0 |  3.08
Kspace  | 1.2303     | 1.2303     | 1.2303     |   0.0 |  5.74
Neigh   | 2.836      | 2.836      | 2.836      |   0.0 | 13.22
Comm    | 0.047146   | 0.047146   | 0.047146   |   0.0 |  0.22
Output  | 0.00023305 | 0.00023305 | 0.00023305 |   0.0 |  0.00
Modify  | 0.52509    | 0.52509    | 0.52509    |   0.0 |  2.45
Other   |            | 0.01328    |            |       |  0.06

With KISSFFT:

Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 16.128     | 16.128     | 16.128     |   0.0 | 75.10
Bond    | 0.65104    | 0.65104    | 0.65104    |   0.0 |  3.03
Kspace  | 1.2841     | 1.2841     | 1.2841     |   0.0 |  5.98
Neigh   | 2.8291     | 2.8291     | 2.8291     |   0.0 | 13.17
Comm    | 0.046276   | 0.046276   | 0.046276   |   0.0 |  0.22
Output  | 0.00026304 | 0.00026304 | 0.00026304 |   0.0 |  0.00
Modify  | 0.52248    | 0.52248    | 0.52248    |   0.0 |  2.43
Other   |            | 0.01317    |            |       |  0.06

As you can see the performance difference is less than 5% of the Kspace time and since Kspace is such a small part of the total time, the total difference between the two runs is about 0.1%. That is much less than the impact of things like turboboost, memory alignment, and processor/memory affinity.

1 Like