Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models loading on CPU instead of GPU after updating version #1962

Open
KuraiAI opened this issue Mar 5, 2025 · 3 comments
Open

Models loading on CPU instead of GPU after updating version #1962

KuraiAI opened this issue Mar 5, 2025 · 3 comments

Comments

@KuraiAI
Copy link

KuraiAI commented Mar 5, 2025

I updated my version because some DeepSeek models were not working loading; after updating, they started loading, but only on CPU. I tried with other older models on my system that used to load on GPU, and they started only loading onto CPU as well.
I noticed this line in particular that others have mentioned for the same issue:
tensor 'token_embd.weight' (q4_K) (and 322 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead

I downgraded the version to 0.3.6 and it loads onto my GPU now.

I can just use the older version but it would be nice if this gets fixed so that those of us with this issue aren't locked out of newer versions.

@mcglynnfinn
Copy link

Having exact same issue but if I go above 0.3.4

What CUDA are you using and are you using pre-made wheel using https://abetlen.github.io/llama-cpp-python/whl/ for example?

@PeterTucker
Copy link

Same issue in Docker version... womp.

@Willian7004
Copy link

I have a similar issue, and the model I use is not supported in v0.3.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants