Release 0.13.5 · b4rtaz/distributed-llama

The Vulkan matmul shader (matmul-forward-q80-q40-f32.comp) was optimized.

Tested on NVIDIA Tesla T4 16 GB using the llama3_1_8b_instruct_q40 model with --buffer-float-type q80.

Before:

🔶 Pred  151 ms Sync    0 ms | Sent     0 kB Recv     0 kB | )
🔶 Pred  151 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  to
🔶 Pred  153 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  be

This version:

🔶 Pred   97 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  obtained
🔶 Pred   96 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  by
🔶 Pred   96 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  pert

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.13.5

Uh oh!