Keep getting this error "error: identifier "pthread_cond_clockwait" is undefined"

Big_Country · June 20, 2025, 10:52am

I cannot fix this error at all. Trying to compile lammps with the given kokkos presets in the cmake folder. Always run into this error. Any one have any ideas how to fix this?

[ 7%] Building CXX object lib/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o
nvcc_wrapper - warning you have set multiple standard flags (-std=c++1* or --std=c++1*), only the last is used because nvcc can only accept a single std setting

d
pthread_cond_clockwait(&_M_cond, __m.native_handle(), __clock,
^

/usr/include/c++/11/mutex(271): error: identifier “pthread_mutex_clocklock” is undefined
{ return !pthread_mutex_clocklock(&_M_mutex, clockid, &__ts); }
^

/usr/include/c++/11/mutex(337): error: identifier “pthread_mutex_clocklock” is undefined
{ return !pthread_mutex_clocklock(&_M_mutex, clockid, &__ts); }

akohlmey · June 20, 2025, 12:17pm

… and I have this recurring nightmare that people don’t report all the crucial information when they ask for help.

To make some meaningful suggestions we need to know:

What is your LAMMPS version?
What is your OS and version?
What is your compiler version?
What is your CUDA toolkit version?
What are the exact commands that you have executed from downloading the source code, over configuring with CMake, to the actual compilation?
Have you set any environment variables? If yes, which ones to what value?

Big_Country · June 20, 2025, 3:07pm

Thanks for the reply. Answer to your questions:

Lammps version: latest, just today cloned from repository
OS: Ubuntu, Ubuntu 22.04.3 LTS
Compiler: Tried with g++ 11 and g++ 13. the problem persists in both
CUDA toolkit 12.9
Commands ran after cloning git, to actual compilation (Just like in lammps doc. website):
mkdir build-kokkos-cuda
cd build-kokkos-cuda
cmake -C …/cmake/presets/basic.cmake
-C …/cmake/presets/kokkos-cuda.cmake …/cmake
cmake --build .
No i have not set any enviroment variables.

Chatgpt says “pthread_mutex_clocklock” error occurs if GLIBC version is less than 2.30, but my version is 2.35.
I can compile the normal PKG_GPU normally. For the kokkos package i run into this error.

akohlmey · June 20, 2025, 3:19pm

Please post the output of ./lmp -h from a non-KOKKOS compilation.

Big_Country · June 20, 2025, 3:40pm

Here is output of ./lmp -h. The command used for building it was:

cmake
-D GPU=yes
-D GPU_API=cuda
-D GPU_PREC=mixed
-D GPU_ARCH=sm_89
-D GPU_DEBUG=no
-D RIGID=on
…/cmake

Output of ./lmp -h

Large-scale Atomic/Molecular Massively Parallel Simulator - 12 Jun 2025 - Development
Git info (develop / patch_12Jun2025-67-g53fec5563c-modified)

Usage example: ./lmp -var t 300 -echo screen -in in.alloy

List of command-line options supported by this LAMMPS executable:

-echo none/screen/log/both : echoing of input script (-e)
-help : print this help message (-h)
-in none/filename : read input from file or stdin (default) (-i)
-kokkos on/off … : turn KOKKOS mode on or off (-k)
-log none/filename : where to send log output (-l)
-mdi ‘’ : pass flags to the MolSSI Driver Interface
-mpicolor color : which exe in a multi-exe mpirun cmd (-m)
-cite : select citation reminder style (-c)
-nocite : disable citation reminder (-nc)
-nonbuf : disable screen/logfile buffering (-nb)
-package style … : invoke package command (-pk)
-partition size1 size2 … : assign partition sizes (-p)
-plog basename : basename for partition logs (-pl)
-pscreen basename : basename for partition screens (-ps)
-restart2data rfile dfile … : convert restart to data file (-r2data)
-restart2dump rfile dgroup dstyle dfile …
: convert restart to dump file (-r2dump)
-restart2info rfile : print info about restart rfile (-r2info)
-reorder topology-specs : processor reordering (-r)
-screen none/filename : where to send screen output (-sc)
-skiprun : skip loops in run and minimize (-sr)
-suffix gpu/intel/kk/opt/omp: style suffix to apply (-sf)
-var varname value : set index style variable (-v)

OS: Linux “Ubuntu 22.04.3 LTS” 6.6.87.2-microsoft-standard-WSL2 x86_64

Compiler: GNU C++ 11.2.0 with OpenMP 4.5
C++ standard: C++17
Embedded fmt library version: 10.2.0
Embedded JSON class version: 3.12.0

MPI v3.1: Open MPI v5.0.8, package: Open MPI conda@74b3e81c684d Distribution, ident: 5.0.8, repo rev: v5.0.8, May 30, 2025

Accelerator configuration:

FFT information:

FFT precision = double
FFT engine = mpiFFT
FFT library = KISS

Active compile time flags:

-DLAMMPS_GZIP
-DLAMMPS_JPEG
-DLAMMPS_FFMPEG
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint): 32-bit
sizeof(bigint): 64-bit

Available compression formats:

Extension: .gz Command: gzip
Extension: .bz2 Command: bzip2
Extension: .zst Command: zstd
Extension: .xz Command: xz
Extension: .lzma Command: xz
Extension: .lz4 Command: lz4

Installed packages:

List of individual style options included in this LAMMPS executable

Atom styles:

atomic body charge ellipsoid hybrid
line sphere tri

Integrate styles:

respa verlet

Minimize styles:

cg fire/old fire hftn quickmin
sd

Pair styles:

born buck buck/coul/cut coul/cut coul/debye
coul/dsf coul/wolf meam/c reax reax/c
mesont/tpm hybrid hybrid/omp hybrid/molecular
hybrid/molecular/omp hybrid/overlay hybrid/overlay/omp
hybrid/scaled hybrid/scaled/omp lj/cut lj/cut/coul/cut
lj/expand morse soft table yukawa
zbl zero

Bond styles:

hybrid zero

Angle styles:

hybrid zero

Dihedral styles:

hybrid zero

Improper styles:

hybrid zero

KSpace styles:

zero

Fix styles

adapt addforce ave/atom ave/chunk ave/correlate
ave/grid ave/histo ave/histo/weight ave/time
aveforce balance box/relax deform deposit
ave/spatial ave/spatial/sphere lb/pc
lb/rigid/pc/sphere reax/c/bonds reax/c/species dt/reset
efield enforce2d evaporate external gravity
halt heat indent langevin lineforce
momentum move nph nph/sphere npt
npt/sphere nve nve/limit nve/noforce nve/sphere
nvt nvt/sllod nvt/sphere pair planeforce
press/berendsen press/langevin print property/atom recenter
restrain set setforce spring spring/chunk
spring/self store/force store/state temp/berendsen temp/rescale
thermal/conductivity vector viscous wall/harmonic
wall/lj1043 wall/lj126 wall/lj93 wall/morse wall/reflect
wall/region wall/table

Compute styles:

aggregate/atom angle angle/local angmom/chunk bond
bond/local centro/atom centroid/stress/atom chunk/atom
chunk/spread/atom cluster/atom cna/atom com
com/chunk coord/atom count/type mesont dihedral
dihedral/local dipole dipole/chunk displace/atom erotate/sphere
erotate/sphere/atom fragment/atom global/atom group/group
gyration gyration/chunk heat/flux improper improper/local
inertia/chunk ke ke/atom msd msd/chunk
omega/chunk orientorder/atom pair pair/local
pe pe/atom pressure property/atom property/chunk
property/grid property/local rdf reduce reduce/chunk
reduce/region slice stress/atom temp temp/chunk
temp/com temp/deform temp/partial temp/profile temp/ramp
temp/region temp/sphere torque/chunk vacf vacf/chunk
vcm/chunk

Region styles:

block cone cylinder ellipsoid intersect
plane prism sphere union

Dump styles:

atom cfg custom atom/mpiio cfg/mpiio
custom/mpiio xyz/mpiio grid grid/vtk image
local movie xyz

Command styles

angle_write balance change_box create_atoms create_bonds
create_box delete_atoms delete_bonds box kim_init
kim_interactions kim_param kim_property kim_query
reset_ids reset_atom_ids reset_mol_ids message server
dihedral_write displace_atoms info minimize read_data
read_dump read_restart replicate rerun run
set velocity write_coeff write_data write_dump
write_restart

akohlmey · June 20, 2025, 5:32pm

Thanks, this all looks normal. Unfortunately, I cannot reproduce it. I see neither the nvcc_wrapper warning nor the pthread error.

I don’t use WSL2 on Windows, but Fedora 41 native.
Also, my CUDA toolkit version is 12.8.

$ lmp -h

Large-scale Atomic/Molecular Massively Parallel Simulator - 12 Jun 2025 - Development
Git info (collected-small-changes / patch_12Jun2025-95-gc98ebb669e-modified)

[...]

OS: Linux "Fedora Linux 41 (Forty One)" 6.14.8-200.fc41.x86_64 x86_64

Compiler: GNU C++ 14.3.1 20250523 (Red Hat 14.3.1-1) with OpenMP 4.5
C++ standard: C++17
Embedded fmt library version: 10.2.0
Embedded JSON class version: 3.12.0

MPI v4.1: MPICH Version:      4.2.2
MPICH Release date: Wed Jul  3 09:16:22 AM CDT 2024
MPICH ABI:          16:2:4

Accelerator configuration:

KOKKOS package API: CUDA Serial
KOKKOS package precision: double
Kokkos library version: 4.6.0

FFT information:

FFT precision  = double
FFT engine  = mpiFFT
FFT library = FFTW3 with threads
KOKKOS FFT engine  = mpiFFT
KOKKOS FFT library = cuFFT

Active compile time flags:

-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_JPEG
-DLAMMPS_FFMPEG
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint):   32-bit
sizeof(bigint):   64-bit

Available compression formats:

Extension: .gz     Command: gzip
Extension: .bz2    Command: bzip2
Extension: .zst    Command: zstd
Extension: .xz     Command: xz
Extension: .lzma   Command: xz
Extension: .lz4    Command: lz4


Installed packages:

KOKKOS KSPACE MANYBODY MOLECULE RIGID 

[...]

I would suggest to remove any add-on gcc compiler and their development packages and leave the “native” one in place. Beyond that, I have nothing else to suggest.

Big_Country · June 20, 2025, 5:44pm

Thank you for your time, I guess i will reset the distro and try in a fresh one.
One final question, do you think the there will be significant speedup using the kokkos package compared to PKG_GPU. i am using a zen4 cpu + 4090, simulating a long chain polymer solution with 278,000 atoms.

akohlmey · June 20, 2025, 6:10pm

That is difficult to say, but since you have a consumer GPU, it is probably faster to use the GPU package in mixed precision mode (unless you need the high accuracy of double precision). KOKKOS currently requires double precision. Single and mixed precision support is under way: Add support for single precision (FP32) and mixed precision to KOKKOS… by stanmoore1 · Pull Request #4608 · lammps/lammps · GitHub

On a single socket AMD Ryzen Threadripper PRO 7985WX 64-Cores machine with an NVIDIA A6000 GPU, I get:

for mpirun -np 64 ./lmp -in in.rhodo.scaled -v x 2 -v y 2 -v z 2 -sf omp

Loop time of 2.61037 on 64 procs for 100 steps with 256000 atoms

Performance: 6.620 ns/day, 3.626 hours/ns, 38.309 timesteps/s, 9.807 Matom-step/s
98.0% CPU use with 64 MPI tasks x 1 OpenMP threads

for mpirun -np 12 ./lmp -in in.rhodo.scaled -v x 2 -v y 2 -v z 2 -sf gpu -pk gpu 0 pair/only yes

Loop time of 2.31076 on 12 procs for 100 steps with 256000 atoms

Performance: 7.478 ns/day, 3.209 hours/ns, 43.276 timesteps/s, 11.079 Matom-step/s
97.2% CPU use with 12 MPI tasks x 1 OpenMP threads

This is with mixed precision using OpenCL

for ./lmp -in in.rhodo.scaled -v x 2 -v y 2 -v z 2 -k on g 1 -sf kk -pk kokkos neigh half

Loop time of 7.32863 on 1 procs for 100 steps with 256000 atoms

Performance: 2.358 ns/day, 10.179 hours/ns, 13.645 timesteps/s, 3.493 Matom-step/s
99.3% CPU use with 1 MPI tasks x 1 OpenMP threads

Big_Country · June 20, 2025, 6:25pm

Let me see if i understood correctly. So, 64 process is slightly slower than running just 12 processes (on 64core threadripper)? And with kokkos (since double precision), it is significantly slower at 2.35ns/day. Am I correct?

akohlmey · June 20, 2025, 6:30pm

Not quite. 64 processors CPU only is about as fast as 12 CPU and 1 GPU (used for the pair style only) with the GPU package (using fewer or more CPUs is slower on my machine).
KOKKOS is GPU resident and thus there very little benefit from using multiple CPUs, in this case LAMMPS gets slower.

For details, see: 7. Accelerate performance — LAMMPS documentation