Replies: 1 comment
-
Nicely done ⚡️ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Provider: Google Cloud
VM: c3d-highcpu-30 (30 vCPU, 15 core, 59 GB memory) europe-west1, AMD Genoa
Distributed Llama version: 0.1.1
Each VM used 16 threads.
Average Single Token Generation Time
Llama 7B / Q40 Weights Q80 Buffer
Llama 13B / Q40 Weights Q80 Buffer
Llama 70B / Q40 Weights Q80 Buffer
Llama 70B / 1 VM
Llama 70B / 2 VM
Llama 70B / 4 VM
CPU Info
Beta Was this translation helpful? Give feedback.
All reactions