[lammps-users] Build with threaded FFTW

Dear LAMMPS users,

I have been trying to build LAMMPS 29Oct20 with threaded FFTW-3.3.10, but cmake fails to recognise the FFTW library with OMP support.

FFTW was built with both shared and static libraries, MPI, SIMD and threading support (both threads and OpenMP), and is installed in /opt/fftw-3.3.10. It passes the provided tests with 1 or several threads.

The cmake call for building LAMMPS is:

cmake -C ../cmake/presets/most.cmake -C ../cmake/presets/nolib.cmake -D CMAKE_INSTALL_PREFIX=/opt/lammps-29Oct20 -D BUILD_OMP=yes -D PKG_USER-OMP=yes -D PKG_USER-INTEL=yes -D FFT=FFTW3 -D FFTW3_INCLUDE_DIR=/opt/fftw-3.3.10/include/ -D FFTW3_LIBRARY=/opt/fftw-3.3.10/lib/libfftw3_omp.so -D FFT_FFTW_THREADS=on ../cmake

It fails with

CMake Error at Modules/Packages/KSPACE.cmake:33 (message):
Need OpenMP enabled FFTW3 library for FFT_THREADS
Call Stack (most recent call first):
CMakeLists.txt:378 (include)

If I remove -D FFT_FFTW_THREADS=on and replace the FFTW library with libfftw3.so, LAMMPS builds without problem, using non-threaded FFTW.

Rebuilding FFTW with only OMP threads doesn't help.

Has anyone managed to get LAMMPS to work with threaded FFTW ?

Etienne

Yes, I have built LAMMPS with threaded FFT. First off, you should say why you care. Its benefits are very limited to certain applications, most specifically when running with the KOKKOS package using OpenMP and trying to reach extreme strong scaling.

In most KSPACE applications the choice and parallelization of the FFT library has a very limited impact (FFTW may be 20-30% faster than KISS FFT and MKL FFT may be even faster, but when KSpace is 20% or less of the total CPU time consumed and the FFT only a part of KSpace, there is very little impact on the overall performance). It is often more beneficial to use optimized compute kernels in the OPT or OPENMP or INTEL package (even without OpenMP!!) and then - even more important - use smart settings in your input for optimal performance (one bad choice in the input can have much more negative performance impact than all the benefits you can achieve with an optimized compilation). So you may be trying to optimize something that is in little need for optimization.

When using CMake, you should initially not apply any explicit FFT settings outside but let CMake try to detect the best option available, and then augment those. In order to “guide” CMake into the right direction, it would be important to first look at and review what CMake detects without any additional settings provided.

At any rate you have to understand that passing the “fft_omp” library instead of the FFTW core library cannot work. the _omp library provides additional functions and both are needed to link with. There are ways to specify this explicitly, but those should not be needed if cmake does its job.

Axel.

p.s.: while use the “no-lib” preset in combination with “most”? in cmake the “most” present is “safe”, i.e. it contains all kinds of packages that can be safely compiled and the compilation of the sources in the lib folder is integrated into the main build. it is merely provided for symmetry with the traditional make process (where it makes much more sense, but there we have a “make yes-most” added recently that will skip adding packages with sources in lib, so it is “safe” to be used there, too.

Hello Axel,

Thank you for this answer.
I would like to try threaded FFT for benchmark purposes. I understand from your message that I should not expect a big difference compared to non-threaded version, and that the bottleneck could be somewhere else.

The above option list was built with the incremental method you described. Without FFTW3_* options, FFTW3 is not found by cmake and the KISS library is used. With the provided options (using libfftw3.so and not libfftw3_omp.so as you pointed out), FFTW is correctly identified, and the build is successful. However, adding FFT_FFTW_THREADS=on results in the same error as before, even with the corrected library.

Etienne

ps: Thanks for your explanations on “no-lib” preset. I will remove it in the final build.

Ok. So here is what you can do:

  1. The “FindFFTW3” cmake script code in LAMMPS looks for a pkg-config configuration file.
    So if your FFTW3 installation is not found by default, it may be because the folder to the fftw3.pc file is missing in your PKG_CONFIG_PATH environment variable
    You can check for that with: pkgconf --path fftw3
    It should return the path to the file: on my machine it is /usr/lib64/pkgconfig/fftw3.pc

  2. If you want to or have to specify the library and include paths explicitly, you can try using:
    -D FFTW3_INCLUDE_DIR= -D FFTW3_LIBRARY= -D FFTW3_OMP_LIBRARY=<fftw3_omp_lib> -D BUILD_OMP=on -D FFT_FFTW_THREADS=on

Looks like the option to pass the fftw3_omp library was overlooked when updating the manual. I’ll add that.

Hello Axel,

Thank you for this answer.
I would like to try threaded FFT for benchmark purposes. I understand from your message that I should not expect a big difference compared to non-threaded version, and that the bottleneck could be somewhere else.

There is a little FFT benchmark built into LAMMPS when you add the command:

kspace_modify fftbench yes

to a suitable simulation.

That should give you some gauge of the performance. Please also note that FFTs are notorious for trashing CPU caches, so you may see a difference in performance with 1 process and multiple processes, specifically if have so many processes that you start running out of memory bandwidth.

Etienne

ps: Thanks for your explanations on “no-lib” preset. I will remove it in the final build.

when using CMake a good way to tweak the final settings is to use the TUI or GUI version, i.e. ccmake or cmake-gui after the initial configuration. that makes it easy to check the list of available and installed packages and quickly select more than what is suggested as the “safe” default without external dependencies. for example, many deployments have plumed installed, so adding the PLUMED package is easy, but since downloading/compiling plumed as part of the cmake compile is very time consuming (and also results in limited functionality since the plumed command line tools are not installed) it is not included any of the default compilation preset (except “all”).

A final remark, we are about a couple of weeks away from releasing a new stable version to supersede the 29 October 2020 version. So you may perhaps want to upgrade to the release candidate version (20 September 2021) to evaluate that (there were some internal changes but also several attractive additions of machine learning potential packages) and them deploy the new stable version when it is released. any feedback on problems with the release candidate would certainly be welcome.

salut,
axel.

Hi Axel,

The hints worked, LAMMPS is built with threaded FFTW3.
The FFT benchmark in kspace is very convenient !
I will have a look at the candidate version and send some feedback if necessary.

Thank you again for your help,
Etienne