Skip to content

mixtral-8x7b-instruct-v0.1.Q4_0.gguf perf on MacBook Pro M3 Max 36GB vs Xeon 3435X 256GB 2x 20GB RTX 4000 GPUs #4743

Answered by ai-bits
ai-bits asked this question in Q&A
Discussion options

You must be logged in to vote

I jumped onto Performance of llama.cpp on Apple Silicon M-series which has at least one (eye-watering) result for a dual! Xeon Platinum on Ubuntu 22.
OMG!
Will try to run the test with Xeon CPU 2x AVX-512 256GB DDR5 only and
secondly hope the model fits into the 40GB GDDR6 (X?) of the 2x 20GB RTX 4000.
Guess the test can resort to partial offload of layers if the model does not fit into the GPUs.
Would be nice if things fit into ONE GPU to avoid overhead of sharding via the GPUs' PCIe 4.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by ai-bits
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants