metal : optimize MoE for large batches #13388

ggerganov · 2025-05-08T19:20:08Z

Utilize #12850 to improve the mat-mat MUL_MAT_ID performance:

Map src1 [n_embd, n_expert_used, n_tokens] -> hsrc1 [n_embd, n_tokens, n_expert]
Perform regular mat-mat multiplication src0 x hsrc1 with dynamic neh11(expert_id)
Unmap the result back to dst

./scripts/compare-commits.sh master gg/metal-mm-id-opt -m models/qwen3-30b-a3b/ggml-model-f16.gguf -m models/qwen3-30b-a3b/ggml-model-q8_0.gguf -m models/qwen3-30b-a3b/ggml-model-q4_0-pure.gguf -m models/mixtral-8x7b-32k-fast/ggml-model-q4_0.gguf -m models/nomic-embed-text-v2-moe/ggml-model-f16.gguf -fa 1 -p 512 -n 0 -t 1

Model	Test	t/s master	t/s gg/metal-mm-id-opt	Speedup
llama 8x7B Q4_0	pp512	295.20	651.44	2.21
nomic-bert-moe 475M F16	pp512	13083.98	24008.05	1.83
qwen3moe 30B.A3B F16	pp512	344.46	1400.07	4.06
qwen3moe 30B.A3B Q4_0	pp512	759.53	1359.49	1.79
qwen3moe 30B.A3B Q8_0	pp512	707.46	1350.71	1.91

ggml-ci

metal : optimize MoE for large batches

b6a4d53

ggml-ci

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : optimize MoE for large batches #13388

metal : optimize MoE for large batches #13388

ggerganov commented May 8, 2025

metal : optimize MoE for large batches #13388

Are you sure you want to change the base?

metal : optimize MoE for large batches #13388

Conversation

ggerganov commented May 8, 2025