Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9 #2013

hunainahmedj · 2025-05-05T15:14:18Z

Description:
I am trying to install llama-cpp-python with cuda support however i run into build errors. All the information is attached below. I can install it without GPU support just fine.

Environment:

GPU: Nvidia RTX 5080
OS: Ubuntu 24.04.2 LTS
Python: 3.10/3.12 (tried both)
GCC: 13.3.0
G++: 13.3.0
CUDA: 12.9
Cuda Toolkit: 12.9

Error Log:

`
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [161 lines of output]
*** scikit-build-core 0.11.2 using CMake 3.28.3 (wheel)
*** Configuring CMake...
loading initial cache file /tmp/tmpw194gw2i/build/CMakeInit.txt
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/x86_64-linux-gnu-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/x86_64-linux-gnu-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.43.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.41")
-- CUDA Toolkit found
-- Using CUDA architectures: native
-- The CUDA compiler identification is NVIDIA 12.9.41
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 13.3.0

  -- Including CUDA backend
  CMake Warning at vendor/llama.cpp/ggml/CMakeLists.txt:298 (message):
    GGML build version fixed at 1 likely due to a shallow clone.


  CMake Warning (dev) at CMakeLists.txt:13 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:97 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:21 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:97 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:13 (install):
    Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:98 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  CMake Warning (dev) at CMakeLists.txt:21 (install):
    Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
  Call Stack (most recent call first):
    CMakeLists.txt:98 (llama_cpp_python_install_target)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  -- Configuring done (6.8s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmpw194gw2i/build
  *** Building project with Ninja...
  Change Dir: '/tmp/tmpw194gw2i/build'

  Run Build Command(s): /usr/bin/ninja -v
  [1/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-hbm.cpp
  [2/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-threading.cpp
  [3/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-alloc.c
  [4/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-traits.cpp
  [5/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp
  [6/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/amx/amx.cpp
  [7/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-backend.cpp
  [8/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-opt.cpp
  [9/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-quants.c
  [10/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp
  [11/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_SHARED -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU -DGGML_USE_CUDA -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-backend-reg.cpp
  [12/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml.c
  [13/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
  FAILED: vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
  /usr/include/c++/13/bits/basic_string.h(3163): error: default argument is not allowed
          substr(size_type __pos = 0, size_type __n = npos) const
                                 ^
            detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510

  /usr/include/c++/13/bits/basic_string.h(3163): error: expected an expression
          substr(size_type __pos = 0, size_type __n = npos) const
                                   ^
            detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510

  /usr/include/c++/13/bits/basic_string.h(3163): error: default argument is not allowed
          substr(size_type __pos = 0, size_type __n = npos) const
                                                    ^
            detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510

  /usr/include/c++/13/bits/basic_string.h(3163): error: expected an expression
          substr(size_type __pos = 0, size_type __n = npos) const
                                                      ^
            detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510

  4 errors detected in the compilation of "/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu".
  [14/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
  [15/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
  FAILED: vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
  /usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
  during RTL pass: cprop
  /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: In function ‘ggml_compute_forward_sub’:
  /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:5426:1: internal compiler error: in try_forward_edges, at cfgcleanup.cc:580
   5426 | }
        | ^
  0x108d8f4 internal_error(char const*, ...)
      ???:0
  0x1083cf2 fancy_abort(char const*, int, char const*)
      ???:0
  Please submit a full bug report, with preprocessed source (by using -freport-bug).
  Please include the complete backtrace with any bug report.
  See <file:///usr/share/doc/gcc-13/README.Bugs> for instructions.
  [16/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/diagmask.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o
  [17/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/acc.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o
  [18/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/arange.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o
  [19/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o
  [20/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argsort.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o
  [21/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/count-equal.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o
  [22/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/clamp.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o
  [23/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/cross-entropy-loss.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
  [24/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/conv-transpose-1d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o
  [25/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/getrows.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o
  [26/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/gla.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o
  [27/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/concat.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o
  [28/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o
  [29/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/llamafile/sgemm.cpp
  [30/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/out-prod.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o
  [31/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/gguf.cpp
  [32/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/opt-step-adamw.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o
  [33/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/im2col.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o
  [34/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o
  [35/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f32.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o
  [36/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/cpy.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o
  [37/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/pad.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o
  [38/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/pool2d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o
  [39/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/convert.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o
  [40/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/norm.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o
  [41/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/binbcast.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o
  [42/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-quants.c
  [43/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmv.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o
  [44/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
  [45/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-wmma-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o
  [46/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmvq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o
  ninja: build stopped: subcommand failed.


  *** CMake build failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
`

Reproduction Steps:

CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9 #2013

Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9 #2013

hunainahmedj commented May 5, 2025

Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9 #2013

Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9 #2013

Comments

hunainahmedj commented May 5, 2025