CUDA problems (no kernel image is available for execution on the device) #474
Unanswered
joshuachris2001
asked this question in
Q&A
Replies: 1 comment 2 replies
-
some reason if I go |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
lately I've been having the common
no kernel image is available for execution on the device
error, but nothing I do seems to fix it. I have tried manually compiling llama.cpp with CUDA support and it works fine, but not though llama-cpp-python. my usual command isCMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
I have also tried in a fresh python environment, still the same error.am I doing something wrong? to note my gpu's max CUDA capabilities is 5.0.
here's a log:
`Using embedded DuckDB with persistence: data will be stored in: db
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GPU, compute capability 5.0
llama.cpp: loading model from models/13B/Manticore/Manticore-13B.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2500
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 9031.71 MB (+ 1608.00 MB per state)
llama_model_load_internal: offloading 0 repeating layers to GPU
llama_model_load_internal: offloaded 0/43 layers to GPU
llama_model_load_internal: total VRAM used: 516 MB
llama_new_context_with_model: kv self size = 1953.12 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
Enter a query: Hello?
CUDA error 209 at /tmp/pip-install-y26ymipg/llama-cpp-python_df790ba1ff86401bb68c221d8a0e2d7b/vendor/llama.cpp/ggml-cuda.cu:2830: no kernel image is available for execution on the device`
I don't know what is going on. even pytorch works semi-perfectly.
Beta Was this translation helpful? Give feedback.
All reactions