Universal LAMMPS binaries for Linux

akohlmey · January 27, 2023, 4:21pm

Dear LAMMPS Users,

Today I would like to draw your attention to an experiment that may help people to get started with LAMMPS faster and lower the barrier to test your inputs with the latest LAMMPS versions before reporting a bug. Most people will need to compile LAMMPS themselves from source and that can be a challenging tasks, if you are lacking the experience. Compilation with additional packages and their libraries is even more challenging. When installing through a package manager, you are, on the other hand, limited by the choices of the person packaging LAMMPS. So the intent is to provide an executable that requires no compilation or special installation, but you can just unpack the archive and get started by running it.

If you go to this LAMMPS Downloads website, you can now download tar archives with precompiled LAMMPS binaries that should work on any 64-bit x86 Linux machine. These have been built with almost all packages included (see below). Only MPI and Python support are missing, since MPI has to match the locally installed MPI library, and Python requires bundling a complete Python installation. All you need to run LAMMPS is the LAMMPS executable “lmp”, so it can be moved to a different location without worrying about (shared) libraries etc. Several add-on tools are included as well.

I am making this post in the forum now in order to get some crucial feedback from users. What I want to know is the following:

Does the executable run on your Linux machine? Please report if not, and provide details about your Linux OS and hardware
Is this useful? Specifically, have you tried to report a bug and had to struggle because you were asked to test your input with the latest version of LAMMPS?
Is this sufficient for your production calculations (i.e. is having only OpenMP parallelization via the OPENMP and INTEL package and no MPI support running your simulations fast enough)?
Is there something missing that is important to you?

Any additional suggestions are welcome, too.

Thanks in advance for any feedback.
Axel.

Here are the specs of the first available archive built from the latest stable release sources. More to come.

Compiler: GNU C++ 11.2.0 with OpenMP 4.5
C++ standard: C++11
MPI v1.0: LAMMPS MPI STUBS for LAMMPS version 23 Jun 2022

Accelerator configuration:

KOKKOS package API: OpenMP Serial
KOKKOS package precision: double
OPENMP package API: OpenMP
OPENMP package precision: double
INTEL package API: OpenMP
INTEL package precision: single mixed double

Active compile time flags:

-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_JPEG
-DLAMMPS_FFMPEG
-DLAMMPS_EXCEPTIONS
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint):   32-bit
sizeof(bigint):   64-bit

Available compression formats:

Extension: .gz     Command: gzip
Extension: .bz2    Command: bzip2
Extension: .zst    Command: zstd
Extension: .xz     Command: xz
Extension: .lzma   Command: xz
Extension: .lz4    Command: lz4
Installed packages:

ASPHERE ATC AWPMD BOCS BODY BPM BROWNIAN CG-DNA CG-SDK CLASS2 COLLOID COLVARS 
COMPRESS CORESHELL DIELECTRIC DIFFRACTION DIPOLE DPD-BASIC DPD-MESO DPD-REACT 
DPD-SMOOTH DRUDE EFF ELECTRODE EXTRA-COMPUTE EXTRA-DUMP EXTRA-FIX 
EXTRA-MOLECULE EXTRA-PAIR FEP GRANULAR INTEL INTERLAYER KOKKOS KSPACE MACHDYN 
MANIFOLD MANYBODY MC MEAM MESONT MGPT MISC ML-IAP ML-PACE ML-RANN ML-SNAP 
MOFFF MOLECULE MOLFILE OPENMP OPT ORIENT PERI PHONON PLUGIN POEMS PTM QEQ QTB 
REACTION REAXFF REPLICA RIGID SHOCK SMTBQ SPH SPIN SRD TALLY UEF VORONOI YAFF

Michael_Jacobs · January 27, 2023, 4:59pm

Unsurprisingly, this gives a cannot execute binary file: Exec format error on the Power9 CPUs of Summit (ORNL). Maybe it works with all x86_64 Linux machines, but not ppc64

Even if it did work, though, I wouldn’t likely use it, as it’s missing GPU acceleration, which I use for coarse-grained simulations, and I expect that’s not something that’s reasonable to precompile.

akohlmey · January 27, 2023, 5:07pm

Sorry about the vagueness. This was meant to be for x86-only. That doesn’t mean that power is not possible, it just requires a different binary. I don’t expect people who have access to machines like Summit to have problems compiling executables, though.

Even worse, it is not possible. These are “true static” binaries, which means they cannot load any kind of shared library or object at runtime unless it is not fully self-contained, i.e. has no dependency on libc.so.x or libcuda.so.x etc. The most portable way to include GPU support would be the GPU package in OpenCL mode. We do that with the Windows binaries, but that requires loading the platform specific accelerator support via shared objects, so it is not possible here.

That said, it is quite impressive that it would be possible to compile executables on Fedora 37, that can also run on Ubuntu 16.04LTS and Ubuntu 22.04LTS at the same time. Normally that would cause all kinds of issues with the shared libraries.

akohlmey · January 27, 2023, 5:46pm

Update: now there is a version of the latest development version in addition to the latest stable version. Both variants now also include Kokkos with OpenMP as additional package.

srtee · January 27, 2023, 10:52pm

I am a bit worried about supporting INTEL in these binaries – as far as I know their kspace styles have not been unit tested.

Other than that, I will give it a shot on my clusters!

akohlmey · January 27, 2023, 11:34pm

There are multiple reasons for that. The most important is that these styles are very aggressive in how they optimize and they have different heuristics for some of the automatic parameters.
So the results are a bit more approximate than for the other kspace styles. The unit tests don’t really test for correctness (they break for many tests with very aggressive optimization) but for reproducibility.

The biggest problem with the INTEL package in these binaries is that those binaries are not compiled with the Intel compilers, so they are missing many of the SIMD optimizations, so the OPENMP package code is often just as fast and sometimes even faster.

Here is a comparison of the rhodo benchmark output (openmp is left, default is middle, intel is right):

Per MPI rank memory allocation (min/avg/max) = 147.9 | 147.9 | 147.9 Mbytes|  Per MPI rank memory allocation (min/avg/max) = 140 | 140 | 140 Mbytes      |  Per MPI rank memory allocation (min/avg/max) = 387.6 | 387.6 | 387.6 Mbyte
  ------------ Step              0 ----- CPU =            0 (sec) -----------|  ------------ Step              0 ----- CPU =            0 (sec) -----------|  ------------ Step              0 ----- CPU =            0 (sec) ----------
  TotEng   =    -25356.2064 KinEng   =     21444.8313 Temp     =       299.03|  TotEng   =    -25356.2064 KinEng   =     21444.8313 Temp     =       299.03|  TotEng   =    -25356.2400 KinEng   =     21444.8313 Temp     =       299.0
  PotEng   =    -46801.0377 E_bond   =      2537.9940 E_angle  =     10921.37|  PotEng   =    -46801.0377 E_bond   =      2537.9940 E_angle  =     10921.37|  PotEng   =    -46801.0713 E_bond   =      2537.9940 E_angle  =     10921.3
  E_dihed  =      5211.7865 E_impro  =       213.5116 E_vdwl   =     -2307.86|  E_dihed  =      5211.7865 E_impro  =       213.5116 E_vdwl   =     -2307.86|  E_dihed  =      5211.7865 E_impro  =       213.5116 E_vdwl   =     -2307.8
  E_coul   =    207025.8927 E_long   =   -270403.7333 Press    =      -149.33|  E_coul   =    207025.8927 E_long   =   -270403.7333 Press    =      -149.33|  E_coul   =    207025.8948 E_long   =   -270403.7690 Press    =      -149.3
  Volume   =    307995.0335                                                  |  Volume   =    307995.0335                                                  |  Volume   =    307995.0335                                                 
  ------------ Step             50 ----- CPU =     3.106687 (sec) -----------|  ------------ Step             50 ----- CPU =     10.54978 (sec) -----------|  ------------ Step             50 ----- CPU =     4.407598 (sec) ----------
  TotEng   =    -25330.0317 KinEng   =     21501.0005 Temp     =       299.82|  TotEng   =    -25330.0317 KinEng   =     21501.0005 Temp     =       299.82|  TotEng   =    -25329.9832 KinEng   =     21500.9322 Temp     =       299.8
  PotEng   =    -46831.0321 E_bond   =      2471.7035 E_angle  =     10836.51|  PotEng   =    -46831.0321 E_bond   =      2471.7035 E_angle  =     10836.51|  PotEng   =    -46830.9155 E_bond   =      2471.6955 E_angle  =     10836.5
  E_dihed  =      5239.6319 E_impro  =       227.1218 E_vdwl   =     -1993.28|  E_dihed  =      5239.6319 E_impro  =       227.1218 E_vdwl   =     -1993.28|  E_dihed  =      5239.6327 E_impro  =       227.1227 E_vdwl   =     -1993.2
  E_coul   =    206797.6802 E_long   =   -270410.3926 Press    =       237.65|  E_coul   =    206797.6802 E_long   =   -270410.3926 Press    =       237.65|  E_coul   =    206797.6805 E_long   =   -270410.3113 Press    =       237.6
  Volume   =    308031.6762                                                  |  Volume   =    308031.6762                                                  |  Volume   =    308031.6775                                                 
  ------------ Step            100 ----- CPU =     6.387671 (sec) -----------|  ------------ Step            100 ----- CPU =     21.51646 (sec) -----------|  ------------ Step            100 ----- CPU =     9.069155 (sec) ----------
  TotEng   =    -25290.7304 KinEng   =     21591.9084 Temp     =       301.09|  TotEng   =    -25290.7303 KinEng   =     21591.9084 Temp     =       301.09|  TotEng   =    -25290.6858 KinEng   =     21591.6327 Temp     =       301.0
  PotEng   =    -46882.6388 E_bond   =      2567.9807 E_angle  =     10781.95|  PotEng   =    -46882.6387 E_bond   =      2567.9807 E_angle  =     10781.95|  PotEng   =    -46882.3185 E_bond   =      2567.9907 E_angle  =     10781.9
  E_dihed  =      5198.7492 E_impro  =       216.7864 E_vdwl   =     -1902.66|  E_dihed  =      5198.7492 E_impro  =       216.7864 E_vdwl   =     -1902.66|  E_dihed  =      5198.7452 E_impro  =       216.7865 E_vdwl   =     -1902.9
  E_coul   =    206659.5226 E_long   =   -270404.9730 Press    =         6.74|  E_coul   =    206659.5226 E_long   =   -270404.9730 Press    =         6.74|  E_coul   =    206660.0501 E_long   =   -270404.9686 Press    =         6.4
  Volume   =    308134.2285                                                  |  Volume   =    308134.2285                                                  |  Volume   =    308134.2375                                                 
  Loop time of 6.3877 on 4 procs for 100 steps with 32000 atoms              |  Loop time of 21.5165 on 1 procs for 100 steps with 32000 atoms             |  Loop time of 9.06919 on 4 procs for 100 steps with 32000 atoms            
                                                                             |                                                                             |                                                                            
  Performance: 2.705 ns/day, 8.872 hours/ns, 15.655 timesteps/s, 500.963 kato|  Performance: 0.803 ns/day, 29.884 hours/ns, 4.648 timesteps/s, 148.723 kato|  Performance: 1.905 ns/day, 12.596 hours/ns, 11.026 timesteps/s, 352.843 ka
  398.6% CPU use with 1 MPI tasks x 4 OpenMP threads                         |  99.9% CPU use with 1 MPI tasks x 1 OpenMP threads                          |  398.6% CPU use with 1 MPI tasks x 4 OpenMP threads                        
                                                                             |                                                                             |                                                                            
  MPI task timing breakdown:                                                 |  MPI task timing breakdown:                                                 |  MPI task timing breakdown:                                                
  Section |  min time  |  avg time  |  max time  |%varavg| %total            |  Section |  min time  |  avg time  |  max time  |%varavg| %total            |  Section |  min time  |  avg time  |  max time  |%varavg| %total           
  ---------------------------------------------------------------            |  ---------------------------------------------------------------            |  ---------------------------------------------------------------           
log.openmp                                                 98,1           80% log.plain                                                  94,1           80% log.intel                                                 102,1          81

akohlmey · February 3, 2023, 5:21pm

Just posted new packages with the latest updates to the develop branch and the update branch for the stable release to the LAMMPS download area

Please give them a try and let me know of any problems. This time even more packages are inlcuded and also the LAMMPS Shell can now be built as true static binary and is included.

For your convenience, I’ve added matching PDF versions of the LAMMPS manual for download.

hothello · February 22, 2023, 5:19pm

Hi Axel,

I found the universal binary very useful, especially when working on university servers whose OS is outdated and unable to compile the latest version of LAMMPS. I can happily say that the universal binary works on Ubuntu Xenial (16.04).

Can you share the exact procedure to compile a universal binary? I’d like to deploy my custom-modified version, too

akohlmey · February 22, 2023, 5:31pm

This is a bit tricky business. Let me first give you the outline so you can tell me, if you feel up to it, before I look up and reconstruct all the necessary steps and details.

You first need to build a Linux-2-Linux cross-compiler that uses musl-libc instead of glibc
Then you need to create a suitable CMake toolchain file
Then you need to build a few libraries with the cross-compiler so that more packages and tools can be included
Then you need to check out a sufficiently recent LAMMPS version
Then build the LAMMPS and other executables with CMake using the custom toolchain file

For 1. I found a set of scripts on GitHub, for 2. I looked at the file for the Linux-2-Windows cross compiler that exists as RPMs for Fedora Linux. I don’t recall how much porting was required for 3. Having to compile libtermcap certainly took me back to my “salad days”, but that is only needed for libreadline, which in turn is required for the lammps shell only. 4. and 5. are straightforward, if the rest was successful.

hothello · February 22, 2023, 5:35pm

Thanks for the feedback. The procedure is definitely outside my comfort zone: I will ask for help and get back to you if there is a chance we will succeed.

akohlmey · February 22, 2023, 5:46pm

I understand. I have been building compilers and cross-compilers (and using cross-compilers, too) from source for a living since I was a grad student some 20+ years ago. So these things don’t scare me anymore. One of the few perks of being an old fart that has been around the block a few times.

akohlmey · February 22, 2023, 6:00pm

On second thought, I might be able to bundle my copy of the cross-compiler and the libraries into a singularity/apptainer container based on the same Fedora Linux version that I am using. That would reduce the steps you would have to do to the the last two (plus installing apptainer, if not already available).

hothello · February 22, 2023, 6:08pm

Axel, thank you for your offer but let me first try to solve the issue with the help of my colleagues. Actually, there is only one file that gives an error, otherwise, the unmodified files still compile with the standard make. Sadly, it was never on my radar to become a software engineer --despite having enjoyed using Linux since 1998 (I think it was a Mandrake CD attached to a computer magazine).
Nostalgia mode: off!

akohlmey · February 22, 2023, 6:09pm

Which file, which error?

hothello · February 22, 2023, 6:15pm

It is a new bond style that Matteo Ricci developed for ellipsoids. When it’s time to push this stuff into the official LAMMPS pool, I will ask if you want to review the paper and the attached code. Meanwhile, you can see what it does in this video: https://youtu.be/kcTrsgIFwLU.

akohlmey · February 24, 2023, 11:19am

@hothello

I created the singularity/apptainer container image, anyway. It made sense for the symmetry of it. Now you can not only have LAMMPS binaries that can run on any x86_64 Linux box, but also create them on any platform where you can run the container.

The container is called fedora37_musl.sif and available alongside the archives with the binaries at LAMMPS Static Linux Binary Download Repository: .

After downloading the container image file and starting the container with, e.g.,: apptainer shell fedora37_musl.sif you can build LAMMPS with:

cmake -S /path/to/lammps/source/cmake -B build --toolchain /usr/musl/share/cmake/linux-musl.cmake ... (add other LAMMPS CMake configuration options here to your liking)
cmake --build build

And now you could exit the container and should find static binaries in the build folder.

hothello · February 24, 2023, 12:22pm

Amazing job, thank you so much!

bulba · June 5, 2024, 12:25pm

Dear Axel (or anyone who could help): I have downloaded, unpacked and run the “universal” build lammps-linux-x86_64-latest.tar.gz, I am using a CentOS Linux release 7.9.2009 (Core) computer with 32-core nodes, submitting using the SLURM script an MPI job results, unfortunately, in running 32-single string jobs. In the job submission script I used the “mpirun -srun lmp -i tip3p.in -l out.log” with lmp on the $PATH and with number of cores set to reflect the physical cores on the node. The problem lies in MPI library, but this seems to be well above my head. Could anyone help where to look at the MPI setup to run lmp based on it please? Thank you well in advance.
Taras

akohlmey · June 5, 2024, 1:23pm

This is the expected and documented behavior. It is impossible to have “universal” binaries that also are MPI compatible, since the MPI library used in the executable has to match the MPI library used on the individual machine. Since that MPI library is Linux version and machine specific, it cannot be included in generic binaries. If you run LAMMPS on a cluster, you want to compile a custom executable to have it optimized for your specific hardware. Below I am quoting the description from the beginning of this topic where it explains that MPI support is not included.

bulba · June 5, 2024, 7:46pm

Dear Axel, thank you very much for your quick answer. Understood… My path to own compiled version is strewn with stones, but that’s completely different story.