Alex,
Thank you very much for the detailed comments. Some followup notes:
When asking question about using or installing LAMMPS please always mention the exact LAMMPS versions that you are using. In my answers below I am assuming that this is either the latest stable release (3 March 2020) or a later patch release.
Yes - 3 March 2020, downloaded very recently. The CPU is a Ryzen 7 2700x running Ubuntu 18.04. Comments on GPU below.
I’m having trouble building LAMMPS with GPU support.
I have an 8-core AMD CPU together with a Pascal architecture graphics card. My card has support for OpenMP (which I already use for CPU-only programs), but not MPI, and I have no interest in installing MPI support.
but you should seriously consider installing an MPI library. if you run on a local machine with a linux distribution, installing an MPI library is trivial, as all distributions have pre-compiled packages. LAMMPS has been designed from ground up to very efficiently support MPI parallelization and - except in some extreme cases - is almost always more efficient when using MPI than OpenMP due to the domain decomposition strategy implicitly leading to better data localization and thus better cache efficiency, which are hugely important for good performance on modern CPUs. OpenMP support in contrast is “grafted on” and - by construction - only applied to parts of the calculation and with the current implementation of OpenMP support in its overhead tends to grow with more threads. specifically in combination with the GPU package, using MPI can lead to a significant performance increase as it will lead to better GPU utilization and parallelize (and thus significantly speed up) the non-GPU code (and if you have ever looked at Amdahl’s law, that can make a significant different, especially when using a GPU).
OpenMPI now installed.
it would also be interesting to know what specifc GPU hardware you have. not every GPU that is CUDA compatible is worth the hassle, some have been known to slow down calculations, if the number of GPU cores is small and the memory bandwidth inside the GPU and between GPU and main memory is limited. This will be significantly emphasised, if you are doing calculations (e.g. using fix npt or other variable cell algorithms) that depend on very accurate computation of the stress tensor (which - unlike forces - typically has a large error on single precision and still a substantial error with mixed precision) and thus will require compiling in double precision mode. most consumer grade GPU have only limited support for double precision floating point math.
My present graphics card is an NVIDIA GT 710 with 1 GB memory. I don’t consider it a serious device for computing, but more useful for learning how to build and run GPU-based programs, and it has a fairly recent (Pascal) architecture. I have already built some codes using OpenACC. They run more slowly than the CPU version, but I have figured out how to keep the big pieces of data on the GPUs without lots of transfer to the CPU, and I can envision a competitive speed up with a faster consumer card. No experience to judge LAMMPS. I believe I can get XSEDE or other HPC resources as needed if operations get scaled up beyond my personal resources.
I’m thinking of purchasing a more powerful consumer card. I don’t have personal funds for, say, a V100. I’m aware of the 1/32 penalty for double precision on most NVIDIA cards, but I think a card in the GTX 16 or RTX 20 series would start to compete favorably with my 8-core CPU. Not clear whether tensor cores provides an advantage except for matrix algebra solvers.
I have installed the NVIDIA HPC SDK, which includes the latest version of CUDA.
There are several issues with OpenMP and GPU:
- The guide by Richard Berger (Temple U) is a broken link.
what link? most likely the information it would be pointing to is outdated by now, anyway.
At the beginning of Section 3.1 of my distribution’s (3 March 2020) doc html files, it says,
Richard Berger (Temple U) has also written a more comprehensive guide for how to use CMake to build LAMMPS. If you are new to CMake it is a good place to start.
with this link:
https://github.com/lammps/lammps/blob/master/cmake/README.md
I mistakenly thought that the most recent distribution would have a doc section that was more up to date than the web site.
.
besides, there are detailed and up-to-date instructions in the LAMMPS manual itself that explain how to compile, how to run and how to get the best performance when using OpenMP and MPI and GPUs.
- I can enable OpenMP only via cmake with -D BUILD_OMP=yes. I did not find an equivalent option using make.
two comments on that. a) this is a required option to enable compiling with OpenMP support, but does not automatically providing you with OpenMP compatible styles. those need to be enabled as well.
b) the explaination of the equivalent for using GNU make is in this section: https://lammps.sandia.gov/doc/Build_basics.html#serial
- I tried running cmake with -D GPU_API=cuda -D GPU_ARCH=sm_60. cmake ignored those options. No lmp_gpu executable was generated.
same as with OpenMP, adding those settings does not automatically include GPU code. those settings are part of the GPU package (which is one of two options to include GPU support in LAMMPS) and will only be considered, if you also enabled the corresponding package. they will be meaningless without.
I’m afraid I don’t know what it means to “enable the corresponding package.” What I found in the “special installation instructions” for GPU was the following cmake options:
-D GPU_API=value # value = opencl (default) or cuda or hip
-D GPU_PREC=value # precision setting
# value = double or mixed (default) or single
-D OCL_TUNE=value # hardware choice for GPU_API=opencl
# generic (default) or intel (Intel CPU) or fermi, kepler, cypress (NVIDIA)
-D GPU_ARCH=value # primary GPU hardware choice for GPU_API=cuda
# value = sm_XX, see below
# default is sm_50
-D HIP_ARCH=value # primary GPU hardware choice for GPU_API=hip
# value depends on selected HIP_PLATFORM
# default is 'gfx906' for HIP_PLATFORM=hcc and 'sm_50' for HIP_PLATFORM=nvcc
-D HIP_USE_DEVICE_SORT=value # enables GPU sorting
# value = yes (default) or no
-D CUDPP_OPT=value # optimization setting for GPU_API=cuda
# enables CUDA Performance Primitives Optimizations
# value = yes (default) or no
-D CUDA_MPS_SUPPORT=value # enables some tweaks required to run with active nvidia-cuda-mps daemon
# value = yes or no (default)
I fixed all parameters that I thought were associated with CUDA and not some other API. I did not run make to build lib/gpu.
when compiling with the cmake build system, the executable will be called “lmp” unless you explicitly request a machine name.
in general, if you want to compile LAMMPS from source, you need to pick one of the two build systems exclusive and follow only instructions pertinent to that build system. in the LAMMPS manual, the corresponding sections are clearly marked to which build system they are relevant to.
I appreciate that I should consider there to be a firewall between cmake and make. I selected cmake because the documentation claimed that this was a superior method, though almost all my prior experience building things has been with make.
Since I am new to LAMMPS, I don’t know what to expect, but it appears that I cannot generate a code that will utilize my GPU.
it also appears that you need to spend more time reading the manual, especially sections 3, 6, 7, and 4.
I had gone through those sections before the prior post both for general build instructions and for including GPU support, but clearly I’m missing something. My understanding of the cmake build process is the following (from the build directory):
/ cmake -D option 1 -D option2 … ../cmake
/ cmake —build
So I’m either missing some ‘-D’ options or else some other steps.
Brad