1
- ## experimental support of Ryzen 7x40 (Linux)
1
+ # experimental support of Ryzen 7x40 (Linux)
2
2
in my case a 7940HS with 64Go of RAM.with fedora:41/rocm-hip:6.2.1
3
3
4
4
The backend only add mulmat(bf16) support OP with hip (no use for rocblas.) There is no limit on RAM usage (GTT/VRAM) weight are allocate on RAM.
@@ -17,113 +17,116 @@ build/igpu/bin/llama-cli --color -ngl 999 --no-mmap -ctk bf16 -ctv bf16 -m Meta-
17
17
18
18
to be fare there is some aleatory crache with 'MES' error, may need some correction on AMD firmware
19
19
20
+ 01/03/2025: 1er version of kernel (V1) (support only BF16 quantisation)
21
+ 14/03/2025: create a new kernel (V2) (support only BF16 quantisation)
22
+
23
+ Note: V2 kernel have a special kernel for gemv (ie token generation)
24
+ Next:
25
+ - adapte V1 kernel for small prompt processing (2-32?)
26
+ - create kernel for FP8 and support optional conversion of weight (FP16/BF16/FP32) to BFP on load.
27
+ - Add FP16 quantisation support
28
+ - create true block kernel for CPU ("blis" like)?
29
+
20
30
Some result (when it not crash):
21
31
32
+ ## Llama-3.2-1B-Instruct/BF16.gguf
33
+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
34
+ | ----------------- | ---------: | -------: | -----: | -----: | ------: | -------------: | --------------: | ---------------: |
35
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp1 | 23.26 ± 0.02 | 18.53 ± 0.17 | 27.59 ± 0.05 |
36
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp2 | 45.39 ± 0.04 | 36.20 ± 0.33 | 34.22 ± 0.03 |
37
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp4 | 90.47 ± 0.06 | 71.78 ± 0.22 | 65.12 ± 0.23 |
38
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp8 | 176.86 ± 2.93 | 139.26 ± 1.73 | 119.79 ± 0.08 |
39
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp16 | 344.33 ± 0.26 | 266.42 ± 3.15 | 200.51 ± 0.99 |
40
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp32 | 562.30 ± 9.50 | 422.50 ± 2.38 | 429.52 ± 0.68 |
41
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp48 | 665.70 ± 9.38 | 653.25 ± 1.98 | 601.83 ± 2.96 |
42
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp64 | 679.13 ± 8.38 | 717.96 ± 4.81 | 760.94 ± 0.32 |
43
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp128 | 723.15 ± 3.93 | 990.37 ± 1.74 | 1062.69 ± 2.78 |
44
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp192 | 738.65 ± 3.53 | 1131.50 ± 6.55 | 1304.20 ± 1.37 |
45
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp256 | 746.87 ± 2.49 | 1151.29 ± 7.71 | 1326.96 ± 2.51 |
46
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp384 | 714.54 ± 5.95 | 1178.65 ± 1.41 | 1220.25 ± 3.79 |
47
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp512 | 677.09 ± 2.49 | 963.16 ± 0.77 | 950.69 ± 1.97 |
48
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp768 | 665.30 ± 1.35 | 901.93 ± 1.94 | 884.07 ± 1.66 |
49
+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | tg16 | 23.00 ± 0.10 | 18.26 ± 0.04 | 27.69 ± 0.08 |
50
+
51
+
52
+ ## Llama-3.2-3B-Instruct/BF16.gguf
53
+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
54
+ | --------------- | ---------: | -------: | -----: | -----: | -------: | --------------: | --------------: | --------------: |
55
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp1 | 8.94 ± 0.07 | 7.85 ± 0.05 | 11.03 ± 0.02 |
56
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp2 | 17.56 ± 0.15 | 15.67 ± 0.04 | 14.61 ± 0.01 |
57
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp4 | 35.02 ± 0.25 | 31.11 ± 0.29 | 27.86 ± 0.01 |
58
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp8 | 69.18 ± 0.52 | 61.01 ± 0.17 | 51.21 ± 0.03 |
59
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp16 | 131.72 ± 1.09 | 117.77 ± 0.26 | 86.80 ± 0.16 |
60
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp32 | 209.28 ± 3.42 | 185.05 ± 0.90 | 178.08 ± 0.27 |
61
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp48 | 232.70 ± 3.41 | 273.60 ± 0.66 | 249.61 ± 0.25 |
62
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp64 | 237.90 ± 3.47 | 300.62 ± 0.68 | 313.17 ± 0.33 |
63
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp128 | 261.37 ± 4.04 | 390.84 ± 0.55 | 438.12 ± 0.16 |
64
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp192 | 263.82 ± 0.60 | 445.00 ± 1.70 | 506.12 ± 0.62 |
65
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp256 | 265.27 ± 0.71 | 450.11 ± 5.35 | 516.21 ± 7.91 |
66
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp384 | 261.27 ± 0.61 | 470.54 ± 0.32 | 485.27 ± 1.81 |
67
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp512 | 254.72 ± 0.25 | 441.51 ± 2.56 | 480.40 ± 0.19 |
68
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp768 | 253.87 ± 0.41 | 429.79 ± 0.43 | 462.86 ± 0.30 |
69
+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | tg16 | 8.90 ± 0.03 | 7.85 ± 0.02 | 11.02 ± 0.00 |
70
+
71
+
22
72
## Meta-Llama-3.1-8B-Instruct/BF16.gguf
23
- ### ref CPU:
24
-
25
- | model | size | params | backend | threads | test | t/s |
26
- | ----------------| ---------: | ---------: | ---------- | ------: | -----: | ------------: |
27
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp1 | 3.84 ± 0.01 |
28
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp2 | 7.51 ± 0.04 |
29
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp4 | 14.96 ± 0.05 |
30
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp8 | 29.62 ± 0.11 |
31
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp16 | 56.31 ± 0.16 |
32
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp32 | 80.67 ± 1.50 |
33
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp64 | 83.32 ± 0.66 |
34
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp128 | 93.48 ± 1.13 |
35
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp256 | 102.99 ± 0.22 |
36
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp512 | 99.32 ± 0.57 |
37
- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | tg16 | 3.84 ± 0.01 |
38
-
39
- ### IGPU:
40
-
41
- | model | size | params | backend | ngl | type_kv | test | t/s |
42
- | ---------------- | ---------: | ------: | ------- | --: | ------: | -----: | ------------: |
43
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp1 | 3.84 ± 0.01 |
44
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp2 | 7.57 ± 0.07 |
45
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp4 | 15.15 ± 0.07 |
46
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp8 | 29.95 ± 0.13 |
47
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp16 | 57.99 ± 0.18 |
48
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp32 | 88.34 ± 0.29 |
49
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp64 | 132.74 ± 0.69 |
50
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp128 | 152.85 ± 0.94 |
51
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp256 | 182.64 ± 7.56 |
52
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp384 | 201.33 ± 1.37 |
53
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp512 | 191.93 ± 1.26 |
54
- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | tg16 | 3.81 ± 0.01 |
73
+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
74
+ | --------------- | ---------: | -------: | -----: | -----: | -----: | -------------: | -------------: | -------------: |
75
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp1 | 3.88 ± 0.01 | 3.88 ± 0.01 | 4.88 ± 0.00 |
76
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp2 | 7.59 ± 0.00 | 7.74 ± 0.01 | 7.40 ± 0.01 |
77
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp4 | 15.04 ± 0.06 | 15.43 ± 0.11 | 14.20 ± 0.03 |
78
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp8 | 29.73 ± 0.13 | 30.23 ± 0.08 | 26.37 ± 0.02 |
79
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp16 | 56.55 ± 0.27 | 58.55 ± 0.53 | 45.95 ± 0.04 |
80
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp32 | 84.81 ± 0.90 | 91.54 ± 0.28 | 83.38 ± 0.01 |
81
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp48 | 90.43 ± 1.76 | 114.77 ± 0.42 | 116.55 ± 0.09 |
82
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp64 | 85.45 ± 0.71 | 137.17 ± 0.31 | 139.46 ± 1.12 |
83
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp128 | 103.68 ± 0.13 | 152.59 ± 1.25 | 195.33 ± 0.22 |
84
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp192 | 107.07 ± 0.18 | 183.30 ± 0.56 | 215.62 ± 0.93 |
85
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp256 | 107.43 ± 0.28 | 185.74 ± 1.14 | 235.19 ± 0.86 |
86
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp384 | 106.74 ± 0.11 | 213.56 ± 1.07 | 230.65 ± 0.09 |
87
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp512 | 104.39 ± 0.17 | 203.01 ± 0.39 | 232.16 ± 0.25 |
88
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp768 | 104.19 ± 0.10 | 194.98 ± 0.57 | 225.46 ± 0.40 |
89
+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | tg16 | 3.88 ± 0.01 | 3.88 ± 0.01 | 4.87 ± 0.01 |
55
90
56
91
57
92
## Mistral-Nemo-Instruct-2407/BF16.gguf
58
- ### ref CPU:
59
-
60
- | model | size | params | backend | threads | test | t/s |
61
- | --------------- | ---------: | ------: | ------- | ------: | -----: | -----------: |
62
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp1 | 1.81 ± 0.00 |
63
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp2 | 3.56 ± 0.01 |
64
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp4 | 7.04 ± 0.06 |
65
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp8 | 14.10 ± 0.08 |
66
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp16 | 27.19 ± 0.17 |
67
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp32 | 42.20 ± 0.54 |
68
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp64 | 52.12 ± 0.29 |
69
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp128 | 61.92 ± 0.19 |
70
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp256 | 62.39 ± 0.16 |
71
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp384 | 63.36 ± 0.29 |
72
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp512 | 62.51 ± 0.02 |
73
- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | tg16 | 1.81 ± 0.00 |
74
-
75
- ### IGPU:
76
-
77
- | model | size | params | backend | ngl | type_kv | test | t/s |
78
- | --------------- | ---------: | ------: | ------- | --: | ------: | ----: | ------------: |
79
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp1 | 2.70 ± 0.01 |
80
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp2 | 5.35 ± 0.01 |
81
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp4 | 10.62 ± 0.06 |
82
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp8 | 21.06 ± 0.05 |
83
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp16 | 40.59 ± 0.41 |
84
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp32 | 62.93 ± 0.17 |
85
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp64 | 92.03 ± 0.20 |
86
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp128 | 101.68 ± 0.24 |
87
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp256 | 122.88 ± 0.68 |
88
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp384 | 136.23 ± 0.28 |
89
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp512 | 124.70 ± 5.07 |
90
- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | tg16 | 2.69 ± 0.00 |
93
+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
94
+ | --------------- | ---------: | -------: | -----: | -----: | ------: | ------------: | -------------: | -------------: |
95
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp1 | 2.52 ± 0.00 | 2.76 ± 0.00 | 3.16 ± 0.01 |
96
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp2 | 4.94 ± 0.01 | 5.49 ± 0.01 | 4.90 ± 0.00 |
97
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp4 | 9.82 ± 0.03 | 10.92 ± 0.01 | 9.42 ± 0.06 |
98
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp8 | 19.40 ± 0.09 | 21.60 ± 0.02 | 17.56 ± 0.01 |
99
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp16 | 36.85 ± 0.35 | 42.03 ± 0.04 | 30.77 ± 0.06 |
100
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp32 | 50.40 ± 0.97 | 65.33 ± 0.09 | 56.43 ± 0.12 |
101
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp48 | 52.77 ± 1.60 | 77.46 ± 0.12 | 76.93 ± 0.22 |
102
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp64 | 54.65 ± 0.30 | 94.48 ± 0.20 | 93.57 ± 0.05 |
103
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp128 | 65.72 ± 0.14 | 103.87 ± 0.43 | 127.90 ± 0.08 |
104
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp192 | 67.66 ± 0.16 | 121.43 ± 0.18 | 143.60 ± 0.22 |
105
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp256 | 68.45 ± 0.13 | 130.03 ± 0.24 | 156.00 ± 0.32 |
106
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp384 | 67.64 ± 0.08 | 142.89 ± 0.07 | 154.52 ± 0.31 |
107
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp512 | 67.02 ± 0.05 | 136.18 ± 0.06 | 156.22 ± 0.15 |
108
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp768 | 66.74 ± 0.03 | 130.78 ± 0.13 | 151.59 ± 0.67 |
109
+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | tg16 | 2.52 ± 0.00 | 2.76 ± 0.00 | 3.16 ± 0.00 |
91
110
92
- ## Mistral-Small-24B-Instruct-2501/BF16.gguf
93
- ### ref CPU:
94
-
95
- | model | size | params | backend | threads | type_kv | test | t/s |
96
- | ----------------- | ---------: | ------: | ------- | ------: | ------: | -----: | ------------: |
97
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp1 | 0.92 ± 0.00 |
98
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp2 | 1.81 ± 0.01 |
99
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp4 | 3.61 ± 0.01 |
100
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp8 | 7.16 ± 0.03 |
101
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp16 | 13.38 ± 0.02 |
102
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp32 | 21.26 ± 0.31 |
103
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp64 | 26.09 ± 0.10 |
104
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp128 | 29.76 ± 0.03 |
105
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp256 | 30.09 ± 0.01 |
106
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp384 | 30.72 ± 0.01 |
107
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp512 | 30.42 ± 0.15 |
108
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | tg16 | 0.92 ± 0.00 |
109
-
110
- ### IGPU:
111
-
112
- | model | size | params | backend | ngl | type_kv | test | t/s |
113
- | ----------------- | ---------: | ------: | ------- | --: | ------: | -----: | ------------: |
114
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp1 | 1.27 ± 0.22 |
115
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp2 | 2.73 ± 0.00 |
116
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp4 | 5.44 ± 0.01 |
117
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp8 | 10.78 ± 0.01 |
118
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp16 | 21.08 ± 0.01 |
119
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp32 | 34.35 ± 0.01 |
120
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp64 | 47.33 ± 0.08 |
121
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp128 | 51.69 ± 0.18 |
122
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp256 | 61.49 ± 0.25 |
123
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp384 | 74.41 ± 0.14 |
124
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp512 | 68.73 ± 3.85 |
125
- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | tg16 | 1.37 ± 0.00 |
126
111
112
+ ## Mistral-Small-24B-Instruct-2501/BF16.gguf
113
+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
114
+ | ---------------- | ---------: | -------: | -----: | -----: | ------: | ------------: | ------------: | ------------: |
115
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp1 | 1.28 ± 0.00 | 1.39 ± 0.00 | 1.64 ± 0.00 |
116
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp2 | 2.52 ± 0.01 | 2.76 ± 0.00 | 2.71 ± 0.00 |
117
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp4 | 5.02 ± 0.01 | 5.50 ± 0.01 | 5.26 ± 0.01 |
118
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp8 | 9.87 ± 0.02 | 10.89 ± 0.02 | 9.94 ± 0.01 |
119
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp16 | 18.32 ± 0.07 | 21.32 ± 0.03 | 17.86 ± 0.02 |
120
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp32 | 25.53 ± 0.24 | 34.65 ± 0.02 | 31.50 ± 0.03 |
121
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp48 | 24.53 ± 0.30 | 36.05 ± 0.02 | 43.93 ± 0.02 |
122
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp64 | 25.88 ± 0.13 | 47.87 ± 0.16 | 53.96 ± 0.06 |
123
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp128 | 29.69 ± 0.07 | 52.03 ± 0.23 | 69.64 ± 0.06 |
124
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp192 | 29.99 ± 0.05 | 61.00 ± 0.18 | 79.73 ± 0.19 |
125
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp256 | 30.94 ± 0.02 | 63.11 ± 0.29 | 87.30 ± 0.27 |
126
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp384 | 32.51 ± 0.01 | 75.00 ± 0.25 | 86.26 ± 0.18 |
127
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp512 | 32.28 ± 0.01 | 71.11 ± 0.18 | 88.11 ± 0.13 |
128
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp768 | 32.02 ± 0.09 | 67.33 ± 0.13 | 85.47 ± 0.09 |
129
+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | tg16 | 1.28 ± 0.00 | 1.38 ± 0.00 | 1.62 ± 0.00 |
127
130
128
131
-------------------------------
129
132
0 commit comments