-
When generating embeddings (both using the It's most likely me that is missing something, but I was under the impression that generating the embeddings could be GPU accelerated as well? When prompting the model with the
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Update to latest |
Beta Was this translation helpful? Give feedback.
-
@MadsRC, the issue you noticed—embedding generation relying on the CPU instead of the GPU—was likely due to an older version of llama.cpp. The reason your prompts utilized the GPU while embeddings didn’t is that GPU acceleration for embeddings was not fully implemented in earlier builds. Updating to the latest master branch and recompiling fixed this, as confirmed in your follow-up. If anyone else faces a similar issue, ensure you’re running the latest version and always recompile after updating. 🚀 |
Beta Was this translation helpful? Give feedback.
Update to latest
master
- fixed here bf83bff