Clip performance on Mac Silicone #1392

foggyghost0 · 2025-02-25T22:12:38Z

Describe the Issue

GPU for clip is not being used with Qwen2-VL-7B-Instruct Model with an mmproj visual module.

I get:
attempting to apply Multimodal Projector: /Users/xxx/Documents/Llava/LLavaImageTagger/mmproj-Qwen2-VL-7B-Instruct-f16.gguf
Clip will use CPU for this model!
clip_model_load: model name: Qwen2-VL-7B-Instruct
clip_model_load: description: image encoder for Qwen2VL
clip_model_load: GGUF version: 3
clip_model_load: alignment: 32
clip_model_load: n_tensors: 521
clip_model_load: n_kv: 20
clip_model_load: ftype: f16

I use the following args to start the compiled Mac os version koboldcpp-mac-arm64:
"$KOBOLDCPP_BINARY" "$TEXT_MODEL" --mmproj "$IMAGE_PROJECTOR" --flashattention --contextsize 4096 --visionmaxres 9999 --noblas --gpulayers 200 --threads 11 --blasthreads 11 --quiet &

Is it possible to address this?

Additional Information:
Apple Mac Studio M2 Max

LostRuins · 2025-02-26T02:56:25Z

Yes, this is a known limitation as GPU clip for qwen2vl does not work on MacOS currently. However, it works fine on vulkan and cuda. It should also work fine for other vision models like minicpm

ref: ggml-org#10896

foggyghost0 · 2025-03-12T14:32:36Z

Yes, this is a known limitation as GPU clip for qwen2vl does not work on MacOS currently. However, it works fine on vulkan and cuda. It should also work fine for other vision models like minicpm

ref: ggml-org#10896

Is there a plan to implement the support?

LostRuins · 2025-03-14T14:09:09Z

It would have to be done upstream. It was previously disabled due to incoherent results.
Now that gemma3 is out, that might work better for you.

foggyghost0 · 2025-03-16T22:20:44Z

Gemma 3 clip also running on CPU with the latest Kodold release:

Attempting to apply Multimodal Projector: /Users/xxx/Documents/Llava/LLavaImageTagger/mmproj-google_gemma-3-12b-it-f16.gguf
Clip will use CPU for this model!
key general.file_type not found in file
Could not list CLIP model properties.
clip_init: loaded meta data with 16 key-value pairs and 439 tensors from /Users/bohdan/Documents/Llava/LLavaImageTagger/mmproj-google_gemma-3-12b-it-f16.gguf
clip_init: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_init: - kv 0: general.architecture str = clip
clip_init: - kv 1: clip.projector_type str = gemma3
clip_init: - kv 2: clip.has_text_encoder bool = false
clip_init: - kv 3: clip.has_vision_encoder bool = true
clip_init: - kv 4: clip.has_llava_projector bool = false
clip_init: - kv 5: clip.vision.image_size u32 = 896
clip_init: - kv 6: clip.vision.patch_size u32 = 14
clip_init: - kv 7: clip.vision.embedding_length u32 = 1152
clip_init: - kv 8: clip.vision.feed_forward_length u32 = 4304
clip_init: - kv 9: clip.vision.projection_dim u32 = 3840
clip_init: - kv 10: clip.vision.block_count u32 = 27
clip_init: - kv 11: clip.vision.attention.head_count u32 = 16
clip_init: - kv 12: clip.vision.attention.layer_norm_epsilon f32 = 0.000001
clip_init: - kv 13: clip.vision.image_mean arr[f32,3] = [0.500000, 0.500000, 0.500000]
clip_init: - kv 14: clip.vision.image_std arr[f32,3] = [0.500000, 0.500000, 0.500000]
clip_init: - kv 15: clip.use_gelu bool = true
clip_init: - type f32: 276 tensors
clip_init: - type f16: 163 tensors
clip_ctx: CLIP using CPU backend
key clip.use_silu not found in file
clip_init: text_encoder: 0
clip_init: vision_encoder: 1
clip_init: llava_projector: 0
clip_init: minicpmv_projector: 0
clip_init: minicpmv_version: 2
clip_init: glm_projector: 0
clip_init: model size: 814.60 MB
clip_init: metadata size: 0.18 MB
clip_init: params backend buffer size = 814.60 MB (439 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.feature_layer not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file
clip_init: CPU compute buffer size = 1131.62 MiB
Load Text Model OK: True
Chat completion heuristic: Google Gemma 3.
Embedded KoboldAI Lite loaded.
Embedded API docs loaded.

LostRuins · 2025-03-17T05:28:18Z

Yes, because people mentioned it did not work with gpu on macOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clip performance on Mac Silicone #1392

Clip performance on Mac Silicone #1392

foggyghost0 commented Feb 25, 2025

LostRuins commented Feb 26, 2025

foggyghost0 commented Mar 12, 2025

LostRuins commented Mar 14, 2025

foggyghost0 commented Mar 16, 2025

LostRuins commented Mar 17, 2025

Clip performance on Mac Silicone #1392

Clip performance on Mac Silicone #1392

Comments

foggyghost0 commented Feb 25, 2025

LostRuins commented Feb 26, 2025

foggyghost0 commented Mar 12, 2025

LostRuins commented Mar 14, 2025

foggyghost0 commented Mar 16, 2025

LostRuins commented Mar 17, 2025