Models loading on CPU instead of GPU after updating version #1962

KuraiAI · 2025-03-05T20:48:11Z

I updated my version because some DeepSeek models were not working loading; after updating, they started loading, but only on CPU. I tried with other older models on my system that used to load on GPU, and they started only loading onto CPU as well.
I noticed this line in particular that others have mentioned for the same issue:
tensor 'token_embd.weight' (q4_K) (and 322 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead

I downgraded the version to 0.3.6 and it loads onto my GPU now.

I can just use the older version but it would be nice if this gets fixed so that those of us with this issue aren't locked out of newer versions.

The text was updated successfully, but these errors were encountered:

mcglynnfinn · 2025-03-09T09:57:55Z

Having exact same issue but if I go above 0.3.4

What CUDA are you using and are you using pre-made wheel using https://abetlen.github.io/llama-cpp-python/whl/ for example?

PeterTucker · 2025-03-12T18:17:24Z

Same issue in Docker version... womp.

Willian7004 · 2025-03-14T09:53:13Z

I have a similar issue, and the model I use is not supported in v0.3.6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models loading on CPU instead of GPU after updating version #1962

Models loading on CPU instead of GPU after updating version #1962

KuraiAI commented Mar 5, 2025

mcglynnfinn commented Mar 9, 2025

PeterTucker commented Mar 12, 2025

Willian7004 commented Mar 14, 2025

Models loading on CPU instead of GPU after updating version #1962

Models loading on CPU instead of GPU after updating version #1962

Comments

KuraiAI commented Mar 5, 2025

mcglynnfinn commented Mar 9, 2025

PeterTucker commented Mar 12, 2025

Willian7004 commented Mar 14, 2025