Skip to content

0.13.5

Compare
Choose a tag to compare
@b4rtaz b4rtaz released this 19 Apr 20:25
· 2 commits to main since this release
8909be9

The Vulkan matmul shader (matmul-forward-q80-q40-f32.comp) was optimized.

Tested on NVIDIA Tesla T4 16 GB using the llama3_1_8b_instruct_q40 model with --buffer-float-type q80.

Before:

🔶 Pred  151 ms Sync    0 ms | Sent     0 kB Recv     0 kB | )
🔶 Pred  151 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  to
🔶 Pred  153 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  be

This version:

🔶 Pred   97 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  obtained
🔶 Pred   96 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  by
🔶 Pred   96 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  pert