-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clip performance on Mac Silicone #1392
Comments
Yes, this is a known limitation as GPU clip for qwen2vl does not work on MacOS currently. However, it works fine on vulkan and cuda. It should also work fine for other vision models like minicpm ref: ggml-org#10896 |
Is there a plan to implement the support? |
It would have to be done upstream. It was previously disabled due to incoherent results. |
Gemma 3 clip also running on CPU with the latest Kodold release: Attempting to apply Multimodal Projector: /Users/xxx/Documents/Llava/LLavaImageTagger/mmproj-google_gemma-3-12b-it-f16.gguf |
Yes, because people mentioned it did not work with gpu on macOS |
Describe the Issue
GPU for clip is not being used with Qwen2-VL-7B-Instruct Model with an mmproj visual module.
I get:
attempting to apply Multimodal Projector: /Users/xxx/Documents/Llava/LLavaImageTagger/mmproj-Qwen2-VL-7B-Instruct-f16.gguf
Clip will use CPU for this model!
clip_model_load: model name: Qwen2-VL-7B-Instruct
clip_model_load: description: image encoder for Qwen2VL
clip_model_load: GGUF version: 3
clip_model_load: alignment: 32
clip_model_load: n_tensors: 521
clip_model_load: n_kv: 20
clip_model_load: ftype: f16
I use the following args to start the compiled Mac os version koboldcpp-mac-arm64:
"$KOBOLDCPP_BINARY" "$TEXT_MODEL" --mmproj "$IMAGE_PROJECTOR" --flashattention --contextsize 4096 --visionmaxres 9999 --noblas --gpulayers 200 --threads 11 --blasthreads 11 --quiet &
Is it possible to address this?
Additional Information:
Apple Mac Studio M2 Max
The text was updated successfully, but these errors were encountered: