Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clip performance on Mac Silicone #1392

Open
foggyghost0 opened this issue Feb 25, 2025 · 5 comments
Open

Clip performance on Mac Silicone #1392

foggyghost0 opened this issue Feb 25, 2025 · 5 comments

Comments

@foggyghost0
Copy link

Describe the Issue

GPU for clip is not being used with Qwen2-VL-7B-Instruct Model with an mmproj visual module.

I get:
attempting to apply Multimodal Projector: /Users/xxx/Documents/Llava/LLavaImageTagger/mmproj-Qwen2-VL-7B-Instruct-f16.gguf
Clip will use CPU for this model!
clip_model_load: model name: Qwen2-VL-7B-Instruct
clip_model_load: description: image encoder for Qwen2VL
clip_model_load: GGUF version: 3
clip_model_load: alignment: 32
clip_model_load: n_tensors: 521
clip_model_load: n_kv: 20
clip_model_load: ftype: f16

I use the following args to start the compiled Mac os version koboldcpp-mac-arm64:
"$KOBOLDCPP_BINARY" "$TEXT_MODEL" --mmproj "$IMAGE_PROJECTOR" --flashattention --contextsize 4096 --visionmaxres 9999 --noblas --gpulayers 200 --threads 11 --blasthreads 11 --quiet &

Is it possible to address this?

Additional Information:
Apple Mac Studio M2 Max

@LostRuins
Copy link
Owner

Yes, this is a known limitation as GPU clip for qwen2vl does not work on MacOS currently. However, it works fine on vulkan and cuda. It should also work fine for other vision models like minicpm

ref: ggml-org#10896

@foggyghost0
Copy link
Author

Yes, this is a known limitation as GPU clip for qwen2vl does not work on MacOS currently. However, it works fine on vulkan and cuda. It should also work fine for other vision models like minicpm

ref: ggml-org#10896

Is there a plan to implement the support?

@LostRuins
Copy link
Owner

It would have to be done upstream. It was previously disabled due to incoherent results.
Now that gemma3 is out, that might work better for you.

@foggyghost0
Copy link
Author

Gemma 3 clip also running on CPU with the latest Kodold release:

Attempting to apply Multimodal Projector: /Users/xxx/Documents/Llava/LLavaImageTagger/mmproj-google_gemma-3-12b-it-f16.gguf
Clip will use CPU for this model!
key general.file_type not found in file
Could not list CLIP model properties.
clip_init: loaded meta data with 16 key-value pairs and 439 tensors from /Users/bohdan/Documents/Llava/LLavaImageTagger/mmproj-google_gemma-3-12b-it-f16.gguf
clip_init: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_init: - kv 0: general.architecture str = clip
clip_init: - kv 1: clip.projector_type str = gemma3
clip_init: - kv 2: clip.has_text_encoder bool = false
clip_init: - kv 3: clip.has_vision_encoder bool = true
clip_init: - kv 4: clip.has_llava_projector bool = false
clip_init: - kv 5: clip.vision.image_size u32 = 896
clip_init: - kv 6: clip.vision.patch_size u32 = 14
clip_init: - kv 7: clip.vision.embedding_length u32 = 1152
clip_init: - kv 8: clip.vision.feed_forward_length u32 = 4304
clip_init: - kv 9: clip.vision.projection_dim u32 = 3840
clip_init: - kv 10: clip.vision.block_count u32 = 27
clip_init: - kv 11: clip.vision.attention.head_count u32 = 16
clip_init: - kv 12: clip.vision.attention.layer_norm_epsilon f32 = 0.000001
clip_init: - kv 13: clip.vision.image_mean arr[f32,3] = [0.500000, 0.500000, 0.500000]
clip_init: - kv 14: clip.vision.image_std arr[f32,3] = [0.500000, 0.500000, 0.500000]
clip_init: - kv 15: clip.use_gelu bool = true
clip_init: - type f32: 276 tensors
clip_init: - type f16: 163 tensors
clip_ctx: CLIP using CPU backend
key clip.use_silu not found in file
clip_init: text_encoder: 0
clip_init: vision_encoder: 1
clip_init: llava_projector: 0
clip_init: minicpmv_projector: 0
clip_init: minicpmv_version: 2
clip_init: glm_projector: 0
clip_init: model size: 814.60 MB
clip_init: metadata size: 0.18 MB
clip_init: params backend buffer size = 814.60 MB (439 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.feature_layer not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file
clip_init: CPU compute buffer size = 1131.62 MiB
Load Text Model OK: True
Chat completion heuristic: Google Gemma 3.
Embedded KoboldAI Lite loaded.
Embedded API docs loaded.

@LostRuins
Copy link
Owner

Yes, because people mentioned it did not work with gpu on macOS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants