Skip to content

Commit ebce03e

Browse files
committed
[IGPU]: test a new Kernel
- special path for tg with good perfo - nice gain for pp > 64
1 parent 0671a16 commit ebce03e

File tree

5 files changed

+867
-103
lines changed

5 files changed

+867
-103
lines changed

README.md

+103-100
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## experimental support of Ryzen 7x40 (Linux)
1+
# experimental support of Ryzen 7x40 (Linux)
22
in my case a 7940HS with 64Go of RAM.with fedora:41/rocm-hip:6.2.1
33

44
The backend only add mulmat(bf16) support OP with hip (no use for rocblas.) There is no limit on RAM usage (GTT/VRAM) weight are allocate on RAM.
@@ -17,113 +17,116 @@ build/igpu/bin/llama-cli --color -ngl 999 --no-mmap -ctk bf16 -ctv bf16 -m Meta-
1717

1818
to be fare there is some aleatory crache with 'MES' error, may need some correction on AMD firmware
1919

20+
01/03/2025: 1er version of kernel (V1) (support only BF16 quantisation)
21+
14/03/2025: create a new kernel (V2) (support only BF16 quantisation)
22+
23+
Note: V2 kernel have a special kernel for gemv (ie token generation)
24+
Next:
25+
- adapte V1 kernel for small prompt processing (2-32?)
26+
- create kernel for FP8 and support optional conversion of weight (FP16/BF16/FP32) to BFP on load.
27+
- Add FP16 quantisation support
28+
- create true block kernel for CPU ("blis" like)?
29+
2030
Some result (when it not crash):
2131

32+
## Llama-3.2-1B-Instruct/BF16.gguf
33+
| model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
34+
| ----------------- | ---------: | -------: | -----: | -----: | ------: | -------------: | --------------: | ---------------: |
35+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp1 | 23.26 ± 0.02 | 18.53 ± 0.17 | 27.59 ± 0.05 |
36+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp2 | 45.39 ± 0.04 | 36.20 ± 0.33 | 34.22 ± 0.03 |
37+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp4 | 90.47 ± 0.06 | 71.78 ± 0.22 | 65.12 ± 0.23 |
38+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp8 | 176.86 ± 2.93 | 139.26 ± 1.73 | 119.79 ± 0.08 |
39+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp16 | 344.33 ± 0.26 | 266.42 ± 3.15 | 200.51 ± 0.99 |
40+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp32 | 562.30 ± 9.50 | 422.50 ± 2.38 | 429.52 ± 0.68 |
41+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp48 | 665.70 ± 9.38 | 653.25 ± 1.98 | 601.83 ± 2.96 |
42+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp64 | 679.13 ± 8.38 | 717.96 ± 4.81 | 760.94 ± 0.32 |
43+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp128 | 723.15 ± 3.93 | 990.37 ± 1.74 | 1062.69 ± 2.78 |
44+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp192 | 738.65 ± 3.53 | 1131.50 ± 6.55 | 1304.20 ± 1.37 |
45+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp256 | 746.87 ± 2.49 | 1151.29 ± 7.71 | 1326.96 ± 2.51 |
46+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp384 | 714.54 ± 5.95 | 1178.65 ± 1.41 | 1220.25 ± 3.79 |
47+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp512 | 677.09 ± 2.49 | 963.16 ± 0.77 | 950.69 ± 1.97 |
48+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp768 | 665.30 ± 1.35 | 901.93 ± 1.94 | 884.07 ± 1.66 |
49+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | tg16 | 23.00 ± 0.10 | 18.26 ± 0.04 | 27.69 ± 0.08 |
50+
51+
52+
## Llama-3.2-3B-Instruct/BF16.gguf
53+
| model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
54+
| --------------- | ---------: | -------: | -----: | -----: | -------: | --------------: | --------------: | --------------: |
55+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp1 | 8.94 ± 0.07 | 7.85 ± 0.05 | 11.03 ± 0.02 |
56+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp2 | 17.56 ± 0.15 | 15.67 ± 0.04 | 14.61 ± 0.01 |
57+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp4 | 35.02 ± 0.25 | 31.11 ± 0.29 | 27.86 ± 0.01 |
58+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp8 | 69.18 ± 0.52 | 61.01 ± 0.17 | 51.21 ± 0.03 |
59+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp16 | 131.72 ± 1.09 | 117.77 ± 0.26 | 86.80 ± 0.16 |
60+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp32 | 209.28 ± 3.42 | 185.05 ± 0.90 | 178.08 ± 0.27 |
61+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp48 | 232.70 ± 3.41 | 273.60 ± 0.66 | 249.61 ± 0.25 |
62+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp64 | 237.90 ± 3.47 | 300.62 ± 0.68 | 313.17 ± 0.33 |
63+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp128 | 261.37 ± 4.04 | 390.84 ± 0.55 | 438.12 ± 0.16 |
64+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp192 | 263.82 ± 0.60 | 445.00 ± 1.70 | 506.12 ± 0.62 |
65+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp256 | 265.27 ± 0.71 | 450.11 ± 5.35 | 516.21 ± 7.91 |
66+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp384 | 261.27 ± 0.61 | 470.54 ± 0.32 | 485.27 ± 1.81 |
67+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp512 | 254.72 ± 0.25 | 441.51 ± 2.56 | 480.40 ± 0.19 |
68+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp768 | 253.87 ± 0.41 | 429.79 ± 0.43 | 462.86 ± 0.30 |
69+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | tg16 | 8.90 ± 0.03 | 7.85 ± 0.02 | 11.02 ± 0.00 |
70+
71+
2272
## Meta-Llama-3.1-8B-Instruct/BF16.gguf
23-
### ref CPU:
24-
25-
| model | size | params | backend | threads | test | t/s |
26-
| ----------------| ---------: | ---------: | ---------- | ------: | -----: | ------------: |
27-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp1 | 3.84 ± 0.01 |
28-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp2 | 7.51 ± 0.04 |
29-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp4 | 14.96 ± 0.05 |
30-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp8 | 29.62 ± 0.11 |
31-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp16 | 56.31 ± 0.16 |
32-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp32 | 80.67 ± 1.50 |
33-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp64 | 83.32 ± 0.66 |
34-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp128 | 93.48 ± 1.13 |
35-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp256 | 102.99 ± 0.22 |
36-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp512 | 99.32 ± 0.57 |
37-
| llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | tg16 | 3.84 ± 0.01 |
38-
39-
### IGPU:
40-
41-
| model | size | params | backend | ngl | type_kv | test | t/s |
42-
| ---------------- | ---------: | ------: | ------- | --: | ------: | -----: | ------------: |
43-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp1 | 3.84 ± 0.01 |
44-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp2 | 7.57 ± 0.07 |
45-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp4 | 15.15 ± 0.07 |
46-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp8 | 29.95 ± 0.13 |
47-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp16 | 57.99 ± 0.18 |
48-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp32 | 88.34 ± 0.29 |
49-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp64 | 132.74 ± 0.69 |
50-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp128 | 152.85 ± 0.94 |
51-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp256 | 182.64 ± 7.56 |
52-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp384 | 201.33 ± 1.37 |
53-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp512 | 191.93 ± 1.26 |
54-
| llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | tg16 | 3.81 ± 0.01 |
73+
| model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
74+
| --------------- | ---------: | -------: | -----: | -----: | -----: | -------------: | -------------: | -------------: |
75+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp1 | 3.88 ± 0.01 | 3.88 ± 0.01 | 4.88 ± 0.00 |
76+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp2 | 7.59 ± 0.00 | 7.74 ± 0.01 | 7.40 ± 0.01 |
77+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp4 | 15.04 ± 0.06 | 15.43 ± 0.11 | 14.20 ± 0.03 |
78+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp8 | 29.73 ± 0.13 | 30.23 ± 0.08 | 26.37 ± 0.02 |
79+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp16 | 56.55 ± 0.27 | 58.55 ± 0.53 | 45.95 ± 0.04 |
80+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp32 | 84.81 ± 0.90 | 91.54 ± 0.28 | 83.38 ± 0.01 |
81+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp48 | 90.43 ± 1.76 | 114.77 ± 0.42 | 116.55 ± 0.09 |
82+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp64 | 85.45 ± 0.71 | 137.17 ± 0.31 | 139.46 ± 1.12 |
83+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp128 | 103.68 ± 0.13 | 152.59 ± 1.25 | 195.33 ± 0.22 |
84+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp192 | 107.07 ± 0.18 | 183.30 ± 0.56 | 215.62 ± 0.93 |
85+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp256 | 107.43 ± 0.28 | 185.74 ± 1.14 | 235.19 ± 0.86 |
86+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp384 | 106.74 ± 0.11 | 213.56 ± 1.07 | 230.65 ± 0.09 |
87+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp512 | 104.39 ± 0.17 | 203.01 ± 0.39 | 232.16 ± 0.25 |
88+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp768 | 104.19 ± 0.10 | 194.98 ± 0.57 | 225.46 ± 0.40 |
89+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | tg16 | 3.88 ± 0.01 | 3.88 ± 0.01 | 4.87 ± 0.01 |
5590

5691

5792
## Mistral-Nemo-Instruct-2407/BF16.gguf
58-
### ref CPU:
59-
60-
| model | size | params | backend | threads | test | t/s |
61-
| --------------- | ---------: | ------: | ------- | ------: | -----: | -----------: |
62-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp1 | 1.81 ± 0.00 |
63-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp2 | 3.56 ± 0.01 |
64-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp4 | 7.04 ± 0.06 |
65-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp8 | 14.10 ± 0.08 |
66-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp16 | 27.19 ± 0.17 |
67-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp32 | 42.20 ± 0.54 |
68-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp64 | 52.12 ± 0.29 |
69-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp128 | 61.92 ± 0.19 |
70-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp256 | 62.39 ± 0.16 |
71-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp384 | 63.36 ± 0.29 |
72-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp512 | 62.51 ± 0.02 |
73-
| llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | tg16 | 1.81 ± 0.00 |
74-
75-
### IGPU:
76-
77-
| model | size | params | backend | ngl | type_kv | test | t/s |
78-
| --------------- | ---------: | ------: | ------- | --: | ------: | ----: | ------------: |
79-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp1 | 2.70 ± 0.01 |
80-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp2 | 5.35 ± 0.01 |
81-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp4 | 10.62 ± 0.06 |
82-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp8 | 21.06 ± 0.05 |
83-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp16 | 40.59 ± 0.41 |
84-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp32 | 62.93 ± 0.17 |
85-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp64 | 92.03 ± 0.20 |
86-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp128 | 101.68 ± 0.24 |
87-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp256 | 122.88 ± 0.68 |
88-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp384 | 136.23 ± 0.28 |
89-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp512 | 124.70 ± 5.07 |
90-
| llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | tg16 | 2.69 ± 0.00 |
93+
| model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
94+
| --------------- | ---------: | -------: | -----: | -----: | ------: | ------------: | -------------: | -------------: |
95+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp1 | 2.52 ± 0.00 | 2.76 ± 0.00 | 3.16 ± 0.01 |
96+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp2 | 4.94 ± 0.01 | 5.49 ± 0.01 | 4.90 ± 0.00 |
97+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp4 | 9.82 ± 0.03 | 10.92 ± 0.01 | 9.42 ± 0.06 |
98+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp8 | 19.40 ± 0.09 | 21.60 ± 0.02 | 17.56 ± 0.01 |
99+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp16 | 36.85 ± 0.35 | 42.03 ± 0.04 | 30.77 ± 0.06 |
100+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp32 | 50.40 ± 0.97 | 65.33 ± 0.09 | 56.43 ± 0.12 |
101+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp48 | 52.77 ± 1.60 | 77.46 ± 0.12 | 76.93 ± 0.22 |
102+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp64 | 54.65 ± 0.30 | 94.48 ± 0.20 | 93.57 ± 0.05 |
103+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp128 | 65.72 ± 0.14 | 103.87 ± 0.43 | 127.90 ± 0.08 |
104+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp192 | 67.66 ± 0.16 | 121.43 ± 0.18 | 143.60 ± 0.22 |
105+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp256 | 68.45 ± 0.13 | 130.03 ± 0.24 | 156.00 ± 0.32 |
106+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp384 | 67.64 ± 0.08 | 142.89 ± 0.07 | 154.52 ± 0.31 |
107+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp512 | 67.02 ± 0.05 | 136.18 ± 0.06 | 156.22 ± 0.15 |
108+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp768 | 66.74 ± 0.03 | 130.78 ± 0.13 | 151.59 ± 0.67 |
109+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | tg16 | 2.52 ± 0.00 | 2.76 ± 0.00 | 3.16 ± 0.00 |
91110

92-
## Mistral-Small-24B-Instruct-2501/BF16.gguf
93-
### ref CPU:
94-
95-
| model | size | params | backend | threads | type_kv | test | t/s |
96-
| ----------------- | ---------: | ------: | ------- | ------: | ------: | -----: | ------------: |
97-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp1 | 0.92 ± 0.00 |
98-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp2 | 1.81 ± 0.01 |
99-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp4 | 3.61 ± 0.01 |
100-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp8 | 7.16 ± 0.03 |
101-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp16 | 13.38 ± 0.02 |
102-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp32 | 21.26 ± 0.31 |
103-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp64 | 26.09 ± 0.10 |
104-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp128 | 29.76 ± 0.03 |
105-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp256 | 30.09 ± 0.01 |
106-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp384 | 30.72 ± 0.01 |
107-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp512 | 30.42 ± 0.15 |
108-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | tg16 | 0.92 ± 0.00 |
109-
110-
### IGPU:
111-
112-
| model | size | params | backend | ngl | type_kv | test | t/s |
113-
| ----------------- | ---------: | ------: | ------- | --: | ------: | -----: | ------------: |
114-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp1 | 1.27 ± 0.22 |
115-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp2 | 2.73 ± 0.00 |
116-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp4 | 5.44 ± 0.01 |
117-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp8 | 10.78 ± 0.01 |
118-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp16 | 21.08 ± 0.01 |
119-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp32 | 34.35 ± 0.01 |
120-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp64 | 47.33 ± 0.08 |
121-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp128 | 51.69 ± 0.18 |
122-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp256 | 61.49 ± 0.25 |
123-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp384 | 74.41 ± 0.14 |
124-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp512 | 68.73 ± 3.85 |
125-
| Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | tg16 | 1.37 ± 0.00 |
126111

112+
## Mistral-Small-24B-Instruct-2501/BF16.gguf
113+
| model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
114+
| ---------------- | ---------: | -------: | -----: | -----: | ------: | ------------: | ------------: | ------------: |
115+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp1 | 1.28 ± 0.00 | 1.39 ± 0.00 | 1.64 ± 0.00 |
116+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp2 | 2.52 ± 0.01 | 2.76 ± 0.00 | 2.71 ± 0.00 |
117+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp4 | 5.02 ± 0.01 | 5.50 ± 0.01 | 5.26 ± 0.01 |
118+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp8 | 9.87 ± 0.02 | 10.89 ± 0.02 | 9.94 ± 0.01 |
119+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp16 | 18.32 ± 0.07 | 21.32 ± 0.03 | 17.86 ± 0.02 |
120+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp32 | 25.53 ± 0.24 | 34.65 ± 0.02 | 31.50 ± 0.03 |
121+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp48 | 24.53 ± 0.30 | 36.05 ± 0.02 | 43.93 ± 0.02 |
122+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp64 | 25.88 ± 0.13 | 47.87 ± 0.16 | 53.96 ± 0.06 |
123+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp128 | 29.69 ± 0.07 | 52.03 ± 0.23 | 69.64 ± 0.06 |
124+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp192 | 29.99 ± 0.05 | 61.00 ± 0.18 | 79.73 ± 0.19 |
125+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp256 | 30.94 ± 0.02 | 63.11 ± 0.29 | 87.30 ± 0.27 |
126+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp384 | 32.51 ± 0.01 | 75.00 ± 0.25 | 86.26 ± 0.18 |
127+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp512 | 32.28 ± 0.01 | 71.11 ± 0.18 | 88.11 ± 0.13 |
128+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp768 | 32.02 ± 0.09 | 67.33 ± 0.13 | 85.47 ± 0.09 |
129+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | tg16 | 1.28 ± 0.00 | 1.38 ± 0.00 | 1.62 ± 0.00 |
127130

128131
-------------------------------
129132

0 commit comments

Comments
 (0)