building LAMMPS with GPU support

I’m having trouble building LAMMPS with GPU support.

I have an 8-core AMD CPU together with a Pascal architecture graphics card. My card has support for OpenMP (which I already use for CPU-only programs), but not MPI, and I have no interest in installing MPI support. I have installed the NVIDIA HPC SDK, which includes the latest version of CUDA.

There are several issues with OpenMP and GPU:

  • The guide by Richard Berger (Temple U) is a broken link.
  • I can enable OpenMP only via cmake with -D BUILD_OMP=yes. I did not find an equivalent option using make.
  • I tried running cmake with -D GPU_API=cuda -D GPU_ARCH=sm_60. cmake ignored those options. No lmp_gpu executable was generated.

Since I am new to LAMMPS, I don’t know what to expect, but it appears that I cannot generate a code that will utilize my GPU.

When asking question about using or installing LAMMPS please always mention the exact LAMMPS versions that you are using. In my answers below I am assuming that this is either the latest stable release (3 March 2020) or a later patch release.

I’m having trouble building LAMMPS with GPU support.

I have an 8-core AMD CPU together with a Pascal architecture graphics card. My card has support for OpenMP (which I already use for CPU-only programs), but not MPI, and I have no interest in installing MPI support.

but you should seriously consider installing an MPI library. if you run on a local machine with a linux distribution, installing an MPI library is trivial, as all distributions have pre-compiled packages. LAMMPS has been designed from ground up to very efficiently support MPI parallelization and - except in some extreme cases - is almost always more efficient when using MPI than OpenMP due to the domain decomposition strategy implicitly leading to better data localization and thus better cache efficiency, which are hugely important for good performance on modern CPUs. OpenMP support in contrast is “grafted on” and - by construction - only applied to parts of the calculation and with the current implementation of OpenMP support in its overhead tends to grow with more threads. specifically in combination with the GPU package, using MPI can lead to a significant performance increase as it will lead to better GPU utilization and parallelize (and thus significantly speed up) the non-GPU code (and if you have ever looked at Amdahl’s law, that can make a significant different, especially when using a GPU).

it would also be interesting to know what specifc GPU hardware you have. not every GPU that is CUDA compatible is worth the hassle, some have been known to slow down calculations, if the number of GPU cores is small and the memory bandwidth inside the GPU and between GPU and main memory is limited. This will be significantly emphasised, if you are doing calculations (e.g. using fix npt or other variable cell algorithms) that depend on very accurate computation of the stress tensor (which - unlike forces - typically has a large error on single precision and still a substantial error with mixed precision) and thus will require compiling in double precision mode. most consumer grade GPU have only limited support for double precision floating point math.

I have installed the NVIDIA HPC SDK, which includes the latest version of CUDA.

There are several issues with OpenMP and GPU:

  • The guide by Richard Berger (Temple U) is a broken link.

what link? most likely the information it would be pointing to is outdated by now, anyway.

besides, there are detailed and up-to-date instructions in the LAMMPS manual itself that explain how to compile, how to run and how to get the best performance when using OpenMP and MPI and GPUs.

  • I can enable OpenMP only via cmake with -D BUILD_OMP=yes. I did not find an equivalent option using make.

two comments on that. a) this is a required option to enable compiling with OpenMP support, but does not automatically providing you with OpenMP compatible styles. those need to be enabled as well.
b) the explaination of the equivalent for using GNU make is in this section: https://lammps.sandia.gov/doc/Build_basics.html#serial

  • I tried running cmake with -D GPU_API=cuda -D GPU_ARCH=sm_60. cmake ignored those options. No lmp_gpu executable was generated.

same as with OpenMP, adding those settings does not automatically include GPU code. those settings are part of the GPU package (which is one of two options to include GPU support in LAMMPS) and will only be considered, if you also enabled the corresponding package. they will be meaningless without.

when compiling with the cmake build system, the executable will be called “lmp” unless you explicitly request a machine name.
in general, if you want to compile LAMMPS from source, you need to pick one of the two build systems exclusive and follow only instructions pertinent to that build system. in the LAMMPS manual, the corresponding sections are clearly marked to which build system they are relevant to.

Since I am new to LAMMPS, I don’t know what to expect, but it appears that I cannot generate a code that will utilize my GPU.

it also appears that you need to spend more time reading the manual, especially sections 3, 6, 7, and 4.

Axel.

Alex,

Thank you very much for the detailed comments. Some followup notes:

When asking question about using or installing LAMMPS please always mention the exact LAMMPS versions that you are using. In my answers below I am assuming that this is either the latest stable release (3 March 2020) or a later patch release.

Yes - 3 March 2020, downloaded very recently. The CPU is a Ryzen 7 2700x running Ubuntu 18.04. Comments on GPU below.

I’m having trouble building LAMMPS with GPU support.

I have an 8-core AMD CPU together with a Pascal architecture graphics card. My card has support for OpenMP (which I already use for CPU-only programs), but not MPI, and I have no interest in installing MPI support.

but you should seriously consider installing an MPI library. if you run on a local machine with a linux distribution, installing an MPI library is trivial, as all distributions have pre-compiled packages. LAMMPS has been designed from ground up to very efficiently support MPI parallelization and - except in some extreme cases - is almost always more efficient when using MPI than OpenMP due to the domain decomposition strategy implicitly leading to better data localization and thus better cache efficiency, which are hugely important for good performance on modern CPUs. OpenMP support in contrast is “grafted on” and - by construction - only applied to parts of the calculation and with the current implementation of OpenMP support in its overhead tends to grow with more threads. specifically in combination with the GPU package, using MPI can lead to a significant performance increase as it will lead to better GPU utilization and parallelize (and thus significantly speed up) the non-GPU code (and if you have ever looked at Amdahl’s law, that can make a significant different, especially when using a GPU).

OpenMPI now installed.

it would also be interesting to know what specifc GPU hardware you have. not every GPU that is CUDA compatible is worth the hassle, some have been known to slow down calculations, if the number of GPU cores is small and the memory bandwidth inside the GPU and between GPU and main memory is limited. This will be significantly emphasised, if you are doing calculations (e.g. using fix npt or other variable cell algorithms) that depend on very accurate computation of the stress tensor (which - unlike forces - typically has a large error on single precision and still a substantial error with mixed precision) and thus will require compiling in double precision mode. most consumer grade GPU have only limited support for double precision floating point math.

My present graphics card is an NVIDIA GT 710 with 1 GB memory. I don’t consider it a serious device for computing, but more useful for learning how to build and run GPU-based programs, and it has a fairly recent (Pascal) architecture. I have already built some codes using OpenACC. They run more slowly than the CPU version, but I have figured out how to keep the big pieces of data on the GPUs without lots of transfer to the CPU, and I can envision a competitive speed up with a faster consumer card. No experience to judge LAMMPS. I believe I can get XSEDE or other HPC resources as needed if operations get scaled up beyond my personal resources.

I’m thinking of purchasing a more powerful consumer card. I don’t have personal funds for, say, a V100. I’m aware of the 1/32 penalty for double precision on most NVIDIA cards, but I think a card in the GTX 16 or RTX 20 series would start to compete favorably with my 8-core CPU. Not clear whether tensor cores provides an advantage except for matrix algebra solvers.

I have installed the NVIDIA HPC SDK, which includes the latest version of CUDA.

There are several issues with OpenMP and GPU:

  • The guide by Richard Berger (Temple U) is a broken link.

what link? most likely the information it would be pointing to is outdated by now, anyway.

At the beginning of Section 3.1 of my distribution’s (3 March 2020) doc html files, it says,

Richard Berger (Temple U) has also written a more comprehensive guide for how to use CMake to build LAMMPS. If you are new to CMake it is a good place to start.

with this link:
https://github.com/lammps/lammps/blob/master/cmake/README.md

I mistakenly thought that the most recent distribution would have a doc section that was more up to date than the web site.
.

besides, there are detailed and up-to-date instructions in the LAMMPS manual itself that explain how to compile, how to run and how to get the best performance when using OpenMP and MPI and GPUs.

  • I can enable OpenMP only via cmake with -D BUILD_OMP=yes. I did not find an equivalent option using make.

two comments on that. a) this is a required option to enable compiling with OpenMP support, but does not automatically providing you with OpenMP compatible styles. those need to be enabled as well.
b) the explaination of the equivalent for using GNU make is in this section: https://lammps.sandia.gov/doc/Build_basics.html#serial

  • I tried running cmake with -D GPU_API=cuda -D GPU_ARCH=sm_60. cmake ignored those options. No lmp_gpu executable was generated.

same as with OpenMP, adding those settings does not automatically include GPU code. those settings are part of the GPU package (which is one of two options to include GPU support in LAMMPS) and will only be considered, if you also enabled the corresponding package. they will be meaningless without.

I’m afraid I don’t know what it means to “enable the corresponding package.” What I found in the “special installation instructions” for GPU was the following cmake options:

-D GPU_API=value             # value = opencl (default) or cuda or hip
-D GPU_PREC=value            # precision setting
                             # value = double or mixed (default) or single
-D OCL_TUNE=value            # hardware choice for GPU_API=opencl
                             # generic (default) or intel (Intel CPU) or fermi, kepler, cypress (NVIDIA)
-D GPU_ARCH=value            # primary GPU hardware choice for GPU_API=cuda
                             # value = sm_XX, see below
                             # default is sm_50
-D HIP_ARCH=value            # primary GPU hardware choice for GPU_API=hip
                             # value depends on selected HIP_PLATFORM
                             # default is 'gfx906' for HIP_PLATFORM=hcc and 'sm_50' for HIP_PLATFORM=nvcc
-D HIP_USE_DEVICE_SORT=value # enables GPU sorting
                             # value = yes (default) or no
-D CUDPP_OPT=value           # optimization setting for GPU_API=cuda
                             # enables CUDA Performance Primitives Optimizations
                             # value = yes (default) or no
-D CUDA_MPS_SUPPORT=value    # enables some tweaks required to run with active nvidia-cuda-mps daemon
                             # value = yes or no (default)

I fixed all parameters that I thought were associated with CUDA and not some other API. I did not run make to build lib/gpu.

when compiling with the cmake build system, the executable will be called “lmp” unless you explicitly request a machine name.
in general, if you want to compile LAMMPS from source, you need to pick one of the two build systems exclusive and follow only instructions pertinent to that build system. in the LAMMPS manual, the corresponding sections are clearly marked to which build system they are relevant to.

I appreciate that I should consider there to be a firewall between cmake and make. I selected cmake because the documentation claimed that this was a superior method, though almost all my prior experience building things has been with make.

Since I am new to LAMMPS, I don’t know what to expect, but it appears that I cannot generate a code that will utilize my GPU.

it also appears that you need to spend more time reading the manual, especially sections 3, 6, 7, and 4.

I had gone through those sections before the prior post both for general build instructions and for including GPU support, but clearly I’m missing something. My understanding of the cmake build process is the following (from the build directory):

/ cmake -D option 1 -D option2 … ../cmake / cmake —build

So I’m either missing some ‘-D’ options or else some other steps.

Brad

[…]

My present graphics card is an NVIDIA GT 710 with 1 GB memory. I don’t consider it a serious device for computing, but more useful for learning how to build and run GPU-based programs, and it has a fairly recent (Pascal) architecture. I have already built some codes using OpenACC. They run more slowly than the CPU version, but I

According to data at wikipedia, GT 710 would be a Kepler architecture GPU, so probably you mixed up the actual name. it would be an entry level card with only a small fraction of the capability of a “proper” GPU. I also would not call the Pascal architecture “recent”. Since then we have had Volta, Turing, and Ampere.
When you compile LAMMPS correctly with the GPU package enabled, there is an executable nvc_get_devices which can report the GPU architecture as “Compute capability”. For Kepler it will return 3.x, while Pascal should be 6.x. FWIW, the original Kepler architecture (3.0) is no longer supported by the current version 11.0 of the CUDA toolkit.

have figured out how to keep the big pieces of data on the GPUs without lots of transfer to the CPU, and I can envision a competitive speed up with a faster consumer card. No experience to judge LAMMPS. I believe I can get XSEDE or other HPC resources as needed if operations get scaled up beyond my personal resources.

GPU speedup vs. CPU-only performance strongly depends on both, the GPU and the CPU that you compare to. While a decade ago, a GPU would clearly outpace a CPU, modern multi-core CPUs with vector units can be competitive. Also, the speedup depends strongly on the use case and whether there is actual GPU support programmed for the kind of calculation you want to do. There are different approaches to GPU acceleration and data management with the KOKKOS package and the GPU package and those have consequences for how to run efficiently and what kind of speedup to expect. This is all discussed at length in the manual. Please also take note that only a subset of LAMMPS code has been ported to GPUs.

I’m thinking of purchasing a more powerful consumer card. I don’t have personal funds for, say, a V100. I’m aware of the 1/32 penalty for double precision on most NVIDIA cards, but I think a card in the GTX 16 or RTX 20 series would start to compete favorably with my 8-core CPU. Not clear whether tensor cores provides an advantage except for matrix algebra solvers.

to the best of my knowledge, tensor cores have no impact on the GPU kernels in LAMMPS. Especially for the GPU package, you need to keep in mind that the majority of code was written 10-15 years ago and predates the (now obsolete) Fermi architecture (but there have been some updates to take advantage of some Fermi and later features). The choice of a good consumer GPU for GPU computing requires significant research effort. Often (but not always) Nvidia is offering one particular GPU model that can be seen as a “Tesla in disguise” with similar performance metrics (including increased double precision floating point support) at a much lower price than the Tesla offerings. If you only want to do testing and debugging, an “upper mid-level” card would be sufficient. E.g. I have a desktop with a (fanless) GTX 1060 card with 6GB RAM, that allows occasional test runs, but is not sufficient for production use.

I have installed the NVIDIA HPC SDK, which includes the latest version of CUDA.

There are several issues with OpenMP and GPU:

  • The guide by Richard Berger (Temple U) is a broken link.

what link? most likely the information it would be pointing to is outdated by now, anyway.

At the beginning of Section 3.1 of my distribution’s (3 March 2020) doc html files, it says,

Richard Berger (Temple U) has also written a more comprehensive guide for how to use CMake to build LAMMPS. If you are new to CMake it is a good place to start.

with this link:
https://github.com/lammps/lammps/blob/master/cmake/README.md

I mistakenly thought that the most recent distribution would have a doc section that was more up to date than the web site.

3 March 2020 is not the “most recent”, it is the latest stable version of LAMMPS. there have been multiple patch releases since.
the link you refer to is pointing to the latest development version of LAMMPS where the document in question no longer exists because its contents were redundant, if you replace “master” in the URL with “stable” you will find it. The file README.md should also be present in the cmake folder of your LAMMPS tarball. However, I would suggest to look at: https://lammps.sandia.gov/doc/Howto_cmake.html which is the revised and updated cmake tutorial with the corresponding information (and then some) from the latest LAMMPS patch (21 July 2020).

The manual at the lammps.sandia.gov website always represents the latest patch release of LAMMPS and thus is always more up-to-date than the stable source distribution.

[…]

I’m afraid I don’t know what it means to “enable the corresponding package.” What I found in the “special installation instructions” for GPU was the following cmake

which means, that you need to re-read https://lammps.sandia.gov/doc/Build_package.html and https://lammps.sandia.gov/doc/Packages.html and the pages linked from there.

[…]

I had gone through those sections before the prior post both for general build instructions and for including GPU support, but clearly I’m missing something. My understanding of the cmake build process is the following (from the build directory):

/ cmake -D option 1 -D option2 … ../cmake / cmake —build

So I’m either missing some ‘-D’ options or else some other steps.

there are no other steps, but the question is which options you use, and there you are missing a lot.
this is all information that is given in the sections of the manual that I pointed out to you. So please read them again, best in the online version (in case the numbers have changed due to some refactoring of the manual since the last stable release). and you may benefit from using the latest patch version (i think it is labeled “development version” on the sandia homepage). you can also download it from here: https://github.com/lammps/lammps/releases

axel

If I go to section 6.3.8 GPU package, there is

This package has specific installation instructions on the Build extras doc page.

If I click on that link, I get to section 3.7.2 GPU package. So far I’m trying to build with CMake and not make. The primary options seem to be

-D PKG_GPU=yes -D GPU_API=cuda -D GPU_ARCH=sm_30

The other options pertain to HIP or OpenCL.

When I try to build, I get at the end:

CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found
version “10.2”)
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.10/Modules/FindCUDA.cmake:1080 (find_package_handle_standard_args)
Modules/Packages/GPU.cmake:34 (find_package)
CMakeLists.txt:466 (include)

– Configuring incomplete, errors occurred!

So it looks like cmake could not find the relevant CUDA files.

I am running with the NVIDIA HPC SDK, which has at least two relevant paths:

/opt/nvidia/hpc_sdk/Linux_x86_64/cuda/11.0 [include and lib files, plus profiler] (or 10.1 or 10.2)

/opt/nvidia/hpc_sdk/Linux_x86_64/20.5/compilers/bin/nvc++ [compiler]

It is certainly not CUDA_HOME=/usr/local/cuda, and setting CUDA_HOME alone will not capture the compiler.

There is some mention of specific make or environment variables that can be set if one uses make, less clear to me if I use cmake, given that I’m not supposed to do a hybrid build.

So either

  • I need to use make
  • I need to find specific instructions for cmake to find the relevant CUDA folders
  • I have failed yet again to understand the instructions.

if cmake fails to find the cuda toolkit, you have to consult the CMake
documentation how to make it find it. we cannot provide all those
CMake specific details in the LAMMPS documentation since they are not
really LAMMPS issues. It works for us with every container image and
every machine we are compiling on.

since you seem to be using CMake 3.10 (ubuntu 18.04?) this would be
at: https://cmake.org/cmake/help/v3.10/module/FindCUDA.html

your referring to CUDA_HOME is definitely mixing the legacy build
instructions with the CMake build process.

Problem solved.

First I installed CMake 3.18, which is better at ‘finding’ CUDA.

Second, I discovered from the NVIDIA forum moderator that my NVIDIA HPC Developer package contained both an “HPC” compiler and a CUDA compiler, and they are not the same entity, even though they have the same name nvcc. When I made sure all environment variables pointed to the CUDA tree (and not HPC), the LAMMPS build proceeded without error.

Thanks for your help with this.