0.13.5
The Vulkan matmul shader (matmul-forward-q80-q40-f32.comp
) was optimized.
Tested on NVIDIA Tesla T4 16 GB using the llama3_1_8b_instruct_q40
model with --buffer-float-type q80
.
Before:
🔶 Pred 151 ms Sync 0 ms | Sent 0 kB Recv 0 kB | )
🔶 Pred 151 ms Sync 0 ms | Sent 0 kB Recv 0 kB | to
🔶 Pred 153 ms Sync 0 ms | Sent 0 kB Recv 0 kB | be
This version:
🔶 Pred 97 ms Sync 0 ms | Sent 0 kB Recv 0 kB | obtained
🔶 Pred 96 ms Sync 0 ms | Sent 0 kB Recv 0 kB | by
🔶 Pred 96 ms Sync 0 ms | Sent 0 kB Recv 0 kB | pert