Gmake error during cmake build with intel compiler

sonicstest · March 7, 2024, 3:31pm

What I tried to do.
Compile 2023 Aug 02 stable release to local Linux server cluster with

Intel compiler 21.3
Cmake 3.19.5
Gcc 9.3.0

What happened.
Configuration using cmake was done without any errors. But the build crashed with gmake error as following:

[ 95%] Building CXX object CMakeFiles/lammps.dir/home/Sourcecode_LAMMPS_20240305_stablerelease_20230802ver/src/KOKKOS/pair_reaxff_kokkos.cpp.o
icpc: command line warning #10121: overriding '-xHost' with '-xCORE-AVX512'
": internal error: 101003_1112

compilation aborted for /home/Sourcecode_LAMMPS_20240305_stablerelease_20230802ver/src/KOKKOS/pair_reaxff_kokkos.cpp (code 4)
gmake[2]: *** [CMakeFiles/lammps.dir/home/Sourcecode_LAMMPS_20240305_stablerelease_20230802ver/src/KOKKOS/pair_reaxff_kokkos.cpp.o] Error 4
gmake[1]: *** [CMakeFiles/lammps.dir/all] Error 2
gmake: *** [all] Error 2

Looks like kokkos reaxff creates the error, but I don’t know what is the problem. How can I escape form gmake error crash?

Local CPU environment = Intel Skylake Xeon Server CPU (AVX512) architecture

grep avx /proc/cpuinfo | uniq: 
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt mba tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local ibpb ibrs stibp dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp arch_capabilities

Cmake configuration.
I tested with and without “most” preset but that didn’t make any change.

cmake -C ../cmake/presets/myintel1_213full2_flag1.cmake \
-DCMAKE_INSTALL_PREFIX=/home/Sourcecode_LAMMPS_20240305_stablerelease_20230802ver \
-D PKG_KOKKOS=ON -D Kokkos_ARCH_SKX=yes \
-D Kokkos_ENABLE_CUDA=no -D Kokkos_ENABLE_OPENMP=yes -D Kokkos_ENABLE_SERIAL=yes \
-D PKG_ATC=yes -D PKG_MPIIO=yes -D BUILD_MPI=ON -D PKG_INTEL=yes -D INTEL_LRT_MODE=threads \
-D PKG_ASPHERE=yes -D PKG_CLASS2=yes -D PKG_KSPACE=yes \
-D PKG_MANYBODY=yes -D PKG_MISC=yes -D PKG_MOLECULE=yes \
-D PKG_RIGID=yes -D PKG_OPT=yes -D PKG_REPLICA=yes -D PKG_OPENMP=yes -D PKG_REAXFF=yes \
-D PKG_EXTRA-FIX=yes -D PKG_EXTRA-DUMP=yes -D PKG_EXTRA-COMPUTE=yes -D PKG_EXTRA-MOLECULE=yes -D PKG_EXTRA-PAIR=yes \
../cmake 2>&1| tee configure.log

Intel preset configuration.

set(CMAKE_CXX_COMPILER "/lstr/applications/atom/compiler/oneAPI/2021.3.0.3219/compiler/2021.3.0/linux/bin/intel64/icpc" CACHE STRING "" FORCE)
set(CMAKE_C_COMPILER "/lstr/applications/atom/compiler/oneAPI/2021.3.0.3219/compiler/2021.3.0/linux/bin/intel64/icc" CACHE STRING "" FORCE)
set(CMAKE_Fortran_COMPILER "/lstr/applications/atom/compiler/oneAPI/2021.3.0.3219/compiler/2021.3.0/linux/bin/intel64/ifort" CACHE STRING "" FORCE)

set(MPI_CXX "/lstr/applications/atom/compiler/oneAPI/2021.3.0.3219/compiler/2021.3.0/linux/bin/intel64/icpc" CACHE STRING "" FORCE)
set(MPI_CXX_COMPILER "/lstr/applications/atom/compiler/oneAPI/2021.3.0.3219/mpi/2021.3.0/bin/mpicxx" CACHE STRING "" FORCE)

unset(HAVE_OMP_H_INCLUDE CACHE)
set(OpenMP_C "/lstr/applications/atom/compiler/oneAPI/2021.3.0.3219/compiler/2021.3.0/linux/bin/intel64/icc" CACHE STRING "" FORCE)
set(OpenMP_C_FLAGS "-qopenmp -qopenmp-simd" CACHE STRING "" FORCE)
set(OpenMP_C_LIB_NAMES "omp" CACHE STRING "" FORCE)
set(OpenMP_CXX "/lstr/applications/atom/compiler/oneAPI/2021.3.0.3219/compiler/2021.3.0/linux/bin/intel64/icpc" CACHE STRING "" FORCE)
set(OpenMP_CXX_FLAGS "-qopenmp -qopenmp-simd" CACHE STRING "" FORCE)
set(OpenMP_CXX_LIB_NAMES "omp" CACHE STRING "" FORCE)
set(OpenMP_Fortran_FLAGS "-qopenmp -qopenmp-simd" CACHE STRING "" FORCE)
set(OpenMP_omp_LIBRARY "libiomp5.so" CACHE PATH "" FORCE)

akohlmey · March 7, 2024, 3:33pm

Internal compiler errors are a bug in the compiler. You need to use a more recent version of the Intel compiler.

sonicstest · March 7, 2024, 4:09pm

Hello,

Thanks for the fast reply.

Under the same conditions & enviornments, I tried configurating and building with intel 24.0 using oneapi cmake, with proper icpx, icx, ifx, and mpicxx addresses.

But now I see following gmake error:

[ 12%] Building CXX object CMakeFiles/lammps.dir/home/Sourcecode_LAMMPS_20240305_stablerelease_20230802ver/src/angle.cpp.o
icpx: error: unknown argument: '-qopenmp;-qopenmp-simd'
gmake[2]: *** [CMakeFiles/lammps.dir/home/Sourcecode_LAMMPS_20240305_stablerelease_20230802ver/src/angle.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/lammps.dir/all] Error 2
gmake: *** [all] Error 2

I googled for icpx: error: unknown argument: ‘-qopenmp;-qopenmp-simd’. Then I found this post. [BUG] OpenMP: omp.h not found by CMake when using oneapi preset · Issue #4033 · lammps/lammps (github.com)

Looks like cmake version is the problem for this “-qopenmp;-qopenmp-simd” crash. Let me contact system admin about higher version of cmake, 3.28.1 or higher. If this crashes again, then I will create separate post.

Thanks!

akohlmey · March 7, 2024, 4:16pm

Why don’t you try compiling with gcc or clang first?

Outside of the INTEL package, there is not much benefit of using the Intel compilers over gcc or clang, if at all.

sonicstest · March 7, 2024, 4:29pm

I’m in the middle of for my own benchmark tests for several interatomic potentials for several local clusters.

During my test from other cluster with different CPU environment, I found that the performance of some simulations from LAMMPS with intel compiler was faster than the one with openmpi + gcc, especially when I apply OMP or Kokkos packages. (I didn’t tested clang though, let me try that as well later. I hope to try other acceleration packages as well, but I just don’t have time for all of them…)

So, I was hoping to check the performance of lammps from intel and gcc from the current cluster environment, if such performance advantage can be observed from here or not.

akohlmey · March 7, 2024, 4:36pm

Faster by how much?

Please note that with today’s hardware performance can be significantly affected by processor affinity choices, load from other processes, and cooling and duration and size of test runs.

On our cluster we see differences in cooling throughout a cabinet (e.g. a node at the very bottom gets warmer than higher up due to the cooling air from the floor rushing by faster) which translates into a significant performance difference since turboboost frequencies are different. Similarly, the performance can be different after running a job that used and warmed up all of the CPU cores and thus also warmed the entire node compared to a node that had been sitting idle for some time.

Finally, for code paths that are not well vectorized, the enabling of AVX can slow down the CPU due to the additional heat generated and the fact that AVX runs at a lower frequency than SSE and the CPU.

srtee · March 7, 2024, 9:43pm

Your situation may be different, but I’ve found it helpful to install my own user copy of cmake. It’s not very big, and if you’re manually supplying the paths of various compilers and libraries you shouldn’t have too much trouble.