Releases: ggml-org/whisper.cpp
v1.7.5
Overview
This is a relatively big update with various build and CI improvements especially for iOS and WASM. There are also some performance gains, especially for the Metal backend and probably for Arm-based devices.
Big shoutout to @danbev for stepping up and completing the maintenance roadmap for this release!
Mobile examples
All mobile examples have been refreshed. The iOS examples specifically are now much easier to build thanks to the new XCFramework workflow. This should simplify significantly integration of whisper.cpp
in 3rd party iOS and macOS apps. CoreML build and convert instructions have also been updated.
WASM examples
The WASM examples are now automatically updated on each new commit and hosted in Github Pages at https://ggerganov.github.io/whisper.cpp/. Problems with CORS rules should be resolved.
Some performance numbers for this release:
M2 Ultra
Flash Attention ON:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 1 | 7.82 | 1.31 | 0.35 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 1 | 8.32 | 1.28 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 1 | 8.21 | 1.28 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 1 | 7.97 | 1.23 | 0.36 | 0.01 | ad4e350 |
M2 ULTRA | METAL | base | 1 | 1 | 13.96 | 1.80 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_0 | 1 | 1 | 15.19 | 1.75 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_1 | 1 | 1 | 15.09 | 1.75 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q8_0 | 1 | 1 | 14.45 | 1.70 | 0.41 | 0.02 | ad4e350 |
M2 ULTRA | METAL | small | 1 | 1 | 40.08 | 3.54 | 0.86 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_0 | 1 | 1 | 45.07 | 3.51 | 0.88 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_1 | 1 | 1 | 45.05 | 3.52 | 0.88 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q8_0 | 1 | 1 | 42.04 | 3.34 | 0.85 | 0.05 | ad4e350 |
M2 ULTRA | METAL | medium | 1 | 1 | 107.20 | 7.28 | 1.79 | 0.11 | ad4e350 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 1 | 125.02 | 6.67 | 1.83 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 1 | 124.83 | 6.70 | 1.84 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 1 | 114.56 | 6.53 | 1.79 | 0.11 | ad4e350 |
M2 ULTRA | METAL | medium-dis | 1 | 1 | 95.96 | 1.01 | 0.23 | 0.01 | ad4e350 |
M2 ULTRA | METAL | large-v2 | 1 | 1 | 194.29 | 10.57 | 2.67 | 0.20 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 1 | 230.74 | 9.57 | 2.73 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 1 | 229.97 | 9.69 | 2.74 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 1 | 208.11 | 9.37 | 2.60 | 0.21 | ad4e350 |
M2 ULTRA | METAL | large-v2-dis | 1 | 1 | 172.72 | 1.12 | 0.26 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 1 | 174.46 | 1.74 | 0.42 | 0.03 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 1 | 205.78 | 1.54 | 0.42 | 0.04 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 1 | 186.33 | 1.50 | 0.40 | 0.03 | ad4e350 |
Flash Attention OFF:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 0 | 8.74 | 1.20 | 0.36 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 0 | 10.30 | 1.15 | 0.38 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 0 | 10.71 | 1.13 | 0.38 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 0 | 9.97 | 1.12 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | base | 1 | 0 | 16.77 | 1.71 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_0 | 1 | 0 | 16.92 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_1 | 1 | 0 | 16.84 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q8_0 | 1 | 0 | 16.12 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | small | 1 | 0 | 45.29 | 3.44 | 0.92 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_0 | 1 | 0 | 50.43 | 3.34 | 0.94 | 0.06 | ad4e350 |
M2 ULTRA | METAL | small-q5_1 | 1 | 0 | 50.49 | 3.35 | 0.93 | 0.06 | ad4e350 |
M2 ULTRA | METAL | small-q8_0 | 1 | 0 | 47.37 | 3.20 | 0.91 | 0.05 | ad4e350 |
M2 ULTRA | METAL | medium | 1 | 0 | 122.81 | 7.39 | 1.99 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 0 | 140.62 | 6.73 | 2.03 | 0.14 | ad4e350 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 0 | 140.44 | 6.74 | 2.04 | 0.14 | ad4e350 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 0 | 131.05 | 6.54 | 1.95 | 0.13 | ad4e350 |
M2 ULTRA | METAL | medium-dis | 1 | 0 | 110.95 | 0.99 | 0.24 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v2 | 1 | 0 | 222.19 | 10.93 | 3.01 | 0.21 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 0 | 258.47 | 9.75 | 3.01 | 0.25 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 0 | 258.40 | 9.85 | 3.01 | 0.24 | ad4e350 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 0 | 236.68 | 9.61 | 2.85 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-dis | 1 | 0 | 199.28 | 1.12 | 0.27 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 0 | 201.49 | 1.76 | 0.45 | 0.03 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 0 | 233.70 | 1.55 | 0.46 | 0.04 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 0 | 214.20 | 1.51 | 0.44 | 0.04 | ad4e350 |
M4 Max
Flash Attention ON:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M4 Max | METAL | tiny | 1 | 1 | 15.22 | 0.89 | 0.26 | 0.01 | ad4e350 |
M4 Max | METAL | tiny-q8_0 | 1 | 1 | 14.70 | 0.86 | 0.26 | 0.01 | ad4e350 |
M4 Max | METAL | base | 1 | 1 | 25.33 | 1.36 | 0.30 | 0.02 | ad4e350 |
M4 Max | METAL | base-q8_0 | 1 | 1 | 21.27 | 1.31 | 0.30 | 0.02 | ad4e350 |
M4 Max | METAL | small | 1 | 1 | 58.43 | 2.78 | 0.60 | 0.05 | ad4e350 |
M4 Max | METAL | small-q8_0 | 1 | 1 | 60.26 | 2.39 | 0.60 | 0.05 | ad4e350 |
M4 Max | METAL | medium | 1 | 1 | 169.73 | 6.03 | 1.31 | 0.14 | ad4e350 |
M4 Max | METAL | medium-q8_0 | 1 | 1 | 176.61 | 4.99 | 1.31 | 0.14 | ad4e350 |
M4 Max | METAL | large-v2 | 1 | 1 | 316.18 | 9.60 | 2.08 | 0.24 | ad4e350 |
M4 Max | METAL | large-v2-q8_0 | 1 | 1 | 329.59 | 7.55 | 2.08 | 0.25 | ad4e350 |
Flash Attention OFF:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M4 Max | METAL | tiny | 1 | 0 | 13.12 | 0.87 | 0.29 | 0.01 | ad4e350 |
M4 Max | METAL | tiny-q8_0 | 1 | 0 | 15.90 | 0.88 | 0.31 | 0.01 | ad4e350 |
M4 Max | METAL | base | 1 | 0 | 23.10 | 1.42 | 0.34 | 0.02 | ad4e350 |
M4 Max | METAL | base-q8_0 | 1 | 0 | 27.25 | 1.31 | 0.34 | 0.02 | ad4e350 |
M4 Max | METAL | small | 1 | 0 | 71.76 | 3.02 | 0.70 | 0.06 | ad4e350 |
M4 Max | METAL | small-q8_0 | 1 | 0 | 73.88 | 2.60 | 0.71 | 0.06 | ad4e350 |
M4 Max | METAL | medium | 1 | 0 | 208.22 | 6.94 | 1.55 | 0.16 | ad4e350 |
M4 Max | METAL | medium-q8_0 | 1 | 0 | 214.65 | 5.90 | 1.57 | 0.17 | ad4e350 |
M4 Max | METAL | large-v2 | 1 | 0 | 381.72 | 11.28 | 2.51 | 0.29 | ad4e350 |
M4 Max | METAL | large-v2-q8_0 | 1 | 0 | 394.97 | 8.90 | 2.45 | 0.30 | ad4e350 |
V100
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
V100 | AVX2 CUDA | tiny | 8 | 1 | 4.01 | 0.90 | 0.25 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | tiny-q5_1 | 8 | 1 | 4.12 | 0.88 | 0.18 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | base | 8 | 1 | 7.00 | 1.30 | 0.35 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | base-q5_1 | 8 | 1 | 7.22 | 1.21 | 0.26 | 0.02 | ad4e350 |
V100 | AVX2 CUDA | small | 8 | 1 | 18.68 | 2.39 | 0.69 | 0.03 | ad4e350 |
V100 | AVX2 CUDA | small-q5_1 | 8 | 1 | 19.38 | 2.32 | 0.51 | 0.03 | ad4e350 |
V100 | AVX2 CUDA | medium | 8 | 1 | 53.17 | 5.15 | 1.45 | 0.06 | ad4e350 |
V100 | AVX2 CUDA | medium-q5_... |
b2365
android.java : re-add ggml source updates (#2975) This commit updates the ggml source to include the new unary and binary operations. I merged https://github.com/ggerganov/whisper.cpp/pull/2958 which seems to have overwritten the changes to the ggml source which were added in https://github.com/ggerganov/whisper.cpp/pull/2972. Sorry about this.
v1.7.4
Overview
Minor release with mostly build fixes.
What's Changed
- whisper : rename binaries + fix install by @ggerganov in #2648
- feat(server): Add option to suppress non-speech tokens by @sachaarbonel in #2649
- whisper : rename suppress_non_speech_tokens to suppress_nst by @ggerganov in #2653
- feat: expose no-speech probability in segment by @sachaarbonel in #2654
- ruby : bug fix on callbacks and no_speech_prob by @KitaitiMakoto in #2656
- Add no_speech_thold to cli by @alubbe in #2663
- Add --suppress_nst support to cli by @alubbe in #2664
- ruby : Fix of C++ header guard name, model URI support, type signature and more by @KitaitiMakoto in #2683
- Enable Windows cublas build by @niksedk in #2676
- docs: replace Core ML with OpenVINO by @konosky in #2686
- rename ggml-cpu-aarch64.c to .cpp by @ego in #2687
- readme : fix real-time audio input example build instructions by @samueldurantes in #2692
- sync : ggml by @ggerganov in #2699
- cli : fix segfault on missing argument by @redzic in #2700
New Contributors
- @sachaarbonel made their first contribution in #2649
- @alubbe made their first contribution in #2663
- @niksedk made their first contribution in #2676
- @konosky made their first contribution in #2686
- @ego made their first contribution in #2687
- @samueldurantes made their first contribution in #2692
- @redzic made their first contribution in #2700
Full Changelog: v1.7.3...v1.7.4
v1.7.3
Overview
- Massive performance improvements for the Metal backend, especially for beams > 1 and for quantized models
- Reduce hallucinations during silence by @jkarthic in #2629
- Implement no_speech_thold by @jkarthic in #2625
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | Metal | tiny | 1 | 1 | 7.90 | 1.26 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_0 | 1 | 1 | 8.44 | 1.23 | 0.36 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_1 | 1 | 1 | 8.26 | 1.27 | 0.37 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q8_0 | 1 | 1 | 8.03 | 1.21 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | base | 1 | 1 | 13.77 | 1.80 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_0 | 1 | 1 | 15.02 | 1.72 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_1 | 1 | 1 | 14.93 | 1.74 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q8_0 | 1 | 1 | 14.26 | 1.68 | 0.41 | 0.02 | ed733e8 |
M2 Ultra | Metal | small | 1 | 1 | 39.76 | 3.54 | 0.85 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_0 | 1 | 1 | 45.07 | 3.47 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_1 | 1 | 1 | 44.82 | 3.49 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q8_0 | 1 | 1 | 41.79 | 3.30 | 0.84 | 0.05 | ed733e8 |
M2 Ultra | Metal | medium | 1 | 1 | 106.73 | 7.28 | 1.78 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-q5_0 | 1 | 1 | 124.43 | 6.63 | 1.83 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q5_1 | 1 | 1 | 124.19 | 6.70 | 1.84 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q8_0 | 1 | 1 | 113.88 | 6.52 | 1.75 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-dis | 1 | 1 | 94.97 | 0.97 | 0.22 | 0.01 | ed733e8 |
M2 Ultra | Metal | large-v2 | 1 | 1 | 193.33 | 10.53 | 2.65 | 0.20 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_0 | 1 | 1 | 229.22 | 9.52 | 2.72 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_1 | 1 | 1 | 229.40 | 9.62 | 2.73 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q8_0 | 1 | 1 | 207.30 | 9.36 | 2.59 | 0.21 | ed733e8 |
M2 Ultra | Metal | large-v2-dis | 1 | 1 | 171.43 | 1.09 | 0.25 | 0.02 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo | 1 | 1 | 173.45 | 1.73 | 0.41 | 0.03 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q5_0 | 1 | 1 | 205.52 | 1.52 | 0.42 | 0.04 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q8_0 | 1 | 1 | 185.90 | 1.48 | 0.40 | 0.03 | ed733e8 |
What's Changed
- sync : ggml by @ggerganov in #2573
- ruby : Follow source tree change by @KitaitiMakoto in #2580
- Add
q8_0
models todownload-ggml-model.sh
by @mrienstra in #2589 - ruby : Add low-level methods to transcribe by @KitaitiMakoto in #2585
- sync : ggml by @ggerganov in #2608
- ruby : Sync whisper.cpp and model download feature by @KitaitiMakoto in #2617
- Fix typo in
download-ggml-model.sh
by @mrienstra in #2623 - Add Missing Include Directory for ggml-cpu in whisper.android CMakeLists by @Thamster in #2624
- fix: prevent division by zero in soft_max vulkan shader by @gn64 in #2633
- cmake : fix "amd64" processor string by @ggerganov in #2638
- Fix typo in Java Binding README by @crummyh in #2637
- Fix hallucinations during silence by @jkarthic in #2629
- Implement no_speech_thold by @jkarthic in #2625
- Improve consistency in stream exameple README commands by @crummyh in #2642
- ruby : Add no_speech_thold by @KitaitiMakoto in #2641
- sync : ggml by @ggerganov in #2639
- ci : msys enable SDL2 build by @ggerganov in #2635
New Contributors
- @Thamster made their first contribution in #2624
- @gn64 made their first contribution in #2633
- @crummyh made their first contribution in #2637
- @jkarthic made their first contribution in #2629
Full Changelog: v1.7.2...v1.7.3
v1.7.3-pre
Overview
Massive performance improvements for the Metal backend, especially for beams > 1. Especially for quantized models.
Setting as "pre-release" since there have been major changes to the build system (now using CMake) and I wan't to gather some feedback about how well the project builds now on various platforms. Please leave comments in the discussion to help fix any remaining issues before the official release.
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | Metal | tiny | 1 | 1 | 7.90 | 1.26 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_0 | 1 | 1 | 8.44 | 1.23 | 0.36 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_1 | 1 | 1 | 8.26 | 1.27 | 0.37 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q8_0 | 1 | 1 | 8.03 | 1.21 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | base | 1 | 1 | 13.77 | 1.80 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_0 | 1 | 1 | 15.02 | 1.72 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_1 | 1 | 1 | 14.93 | 1.74 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q8_0 | 1 | 1 | 14.26 | 1.68 | 0.41 | 0.02 | ed733e8 |
M2 Ultra | Metal | small | 1 | 1 | 39.76 | 3.54 | 0.85 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_0 | 1 | 1 | 45.07 | 3.47 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_1 | 1 | 1 | 44.82 | 3.49 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q8_0 | 1 | 1 | 41.79 | 3.30 | 0.84 | 0.05 | ed733e8 |
M2 Ultra | Metal | medium | 1 | 1 | 106.73 | 7.28 | 1.78 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-q5_0 | 1 | 1 | 124.43 | 6.63 | 1.83 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q5_1 | 1 | 1 | 124.19 | 6.70 | 1.84 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q8_0 | 1 | 1 | 113.88 | 6.52 | 1.75 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-dis | 1 | 1 | 94.97 | 0.97 | 0.22 | 0.01 | ed733e8 |
M2 Ultra | Metal | large-v2 | 1 | 1 | 193.33 | 10.53 | 2.65 | 0.20 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_0 | 1 | 1 | 229.22 | 9.52 | 2.72 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_1 | 1 | 1 | 229.40 | 9.62 | 2.73 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q8_0 | 1 | 1 | 207.30 | 9.36 | 2.59 | 0.21 | ed733e8 |
M2 Ultra | Metal | large-v2-dis | 1 | 1 | 171.43 | 1.09 | 0.25 | 0.02 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo | 1 | 1 | 173.45 | 1.73 | 0.41 | 0.03 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q5_0 | 1 | 1 | 205.52 | 1.52 | 0.42 | 0.04 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q8_0 | 1 | 1 | 185.90 | 1.48 | 0.40 | 0.03 | ed733e8 |
What's Changed
- sync : ggml by @ggerganov in #2573
- ruby : Follow source tree change by @KitaitiMakoto in #2580
- Add
q8_0
models todownload-ggml-model.sh
by @mrienstra in #2589 - ruby : Add low-level methods to transcribe by @KitaitiMakoto in #2585
- sync : ggml by @ggerganov in #2608
Full Changelog: v1.7.2...v1.7.3-pre
v1.7.2
Overview
- Various improvements in the Metal backend
- Fix extra memory usage for large samples
- Remove limit for
ggml_context
(i.e. more beams and processors are supported)
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 1 | 9.51 | 1.39 | 0.41 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 1 | 9.57 | 1.41 | 0.42 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 1 | 8.74 | 1.39 | 0.42 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q8_0 | 1 | 1 | 8.36 | 1.33 | 0.41 | 0.01 | 83ac284 |
M2 Ultra | METAL | base | 1 | 1 | 14.27 | 1.90 | 0.63 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q5_0 | 1 | 1 | 15.50 | 1.90 | 0.65 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q5_1 | 1 | 1 | 15.67 | 1.88 | 0.65 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q8_0 | 1 | 1 | 14.69 | 1.81 | 0.63 | 0.02 | 83ac284 |
M2 Ultra | METAL | small | 1 | 1 | 40.85 | 3.77 | 1.43 | 0.05 | 83ac284 |
M2 Ultra | METAL | small-q5_0 | 1 | 1 | 45.99 | 3.90 | 1.52 | 0.05 | 83ac284 |
M2 Ultra | METAL | small-q5_1 | 1 | 1 | 46.19 | 3.83 | 1.50 | 0.06 | 83ac284 |
M2 Ultra | METAL | small-q8_0 | 1 | 1 | 42.90 | 3.65 | 1.46 | 0.05 | 83ac284 |
M2 Ultra | METAL | medium | 1 | 1 | 109.01 | 7.59 | 3.24 | 0.11 | 83ac284 |
M2 Ultra | METAL | medium-q5_0 | 1 | 1 | 126.78 | 7.55 | 3.45 | 0.13 | 83ac284 |
M2 Ultra | METAL | medium-q5_1 | 1 | 1 | 127.71 | 7.39 | 3.43 | 0.13 | 83ac284 |
M2 Ultra | METAL | medium-q8_0 | 1 | 1 | 115.97 | 7.21 | 3.35 | 0.12 | 83ac284 |
M2 Ultra | METAL | medium-dis | 1 | 1 | 97.74 | 1.06 | 0.36 | 0.01 | 83ac284 |
M2 Ultra | METAL | large-v2 | 1 | 1 | 196.99 | 11.29 | 5.06 | 0.20 | 83ac284 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 1 | 233.88 | 10.83 | 5.56 | 0.24 | 83ac284 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 1 | 234.03 | 10.73 | 5.46 | 0.24 | 83ac284 |
M2 Ultra | METAL | large-v2-q8_0 | 1 | 1 | 210.83 | 10.29 | 5.23 | 0.22 | 83ac284 |
M2 Ultra | METAL | large-v2-dis | 1 | 1 | 175.37 | 1.18 | 0.42 | 0.02 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo | 1 | 1 | 177.35 | 1.85 | 0.73 | 0.03 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 1 | 209.31 | 1.69 | 0.80 | 0.04 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo-q8_0 | 1 | 1 | 189.55 | 1.64 | 0.75 | 0.03 | 83ac284 |
What's Changed
- Added OpenVino init on state by @sandrohanea in #2464
- Updating the Quick start by @stsfaroz in #2475
- max_length from max_target_positions by @CrispStrobe in #2477
- Add dtw preset for large-v3-turbo by @rotemdan in #2481
- make : fix GGML_VULKAN=1 build by @ggerganov in #2485
- Add Vulkan notice in README.md by @toboil-features in #2488
- Fix Ruby binding building by @KitaitiMakoto in #2484
- Update of README.md by @toboil-features in #2489
- whisper: fix index overflow by @Josscii in #2505
- ruby : Add Metal support by @KitaitiMakoto in #2516
- ruby: New segment callback by @KitaitiMakoto in #2506
- ruby : add more APIs by @KitaitiMakoto in #2518
- ruby: fix installation test by @KitaitiMakoto in #2519
- When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step by @jettoblack in #2515
- ci : fix openblas build by @ggerganov in #2511
- whisper : reduce ggml_context usage by @ggerganov in #2525
- sync : ggml by @ggerganov in #2528
- passing samples_padded by ref to the threads. by @vinmisra in #2534
- fix ffmpeg v5 build by @stsydow in #2543
- fix: ggml-vulkan logs by @thewh1teagle in #2547
- Fix the instructions on the Ruby binding by @wilsonsilva in #2548
- whisper.swiftui : add model download list & bench methods by @jhen0409 in #2546
- ruby : Add more API by @KitaitiMakoto in #2551
- Fix building workflow for linux/arm64 container by @rai62 in #2555
- sync : ggml by @ggerganov in #2561
- whisper.swiftui : switch Mac dest to Mac (Designed for iPad) by @jhen0409 in #2562
- ci : use local ggml by @ggerganov in #2567
- sycl: fix example build by @stsydow in #2570
New Contributors
- @stsfaroz made their first contribution in #2475
- @CrispStrobe made their first contribution in #2477
- @toboil-features made their first contribution in #2488
- @KitaitiMakoto made their first contribution in #2484
- @Josscii made their first contribution in #2505
- @jettoblack made their first contribution in #2515
- @vinmisra made their first contribution in #2534
- @stsydow made their first contribution in #2543
- @wilsonsilva made their first contribution in #2548
- @rai62 made their first contribution in #2555
Full Changelog: v1.7.1...v1.7.2
v1.7.2-pre
Overview
This is a pre-release since I think there have been some reports about memory leaks which I haven't had the time to investigate and confirm. If these are resolved in the next days, will add them to the official 1.7.2
release next week.
- Various improvements in the Metal backend
- Fix extra memory usage for large samples
- Remove limit for
ggml_context
(i.e. more beams and processors are supported)
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 1 | 9.51 | 1.39 | 0.41 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 1 | 9.57 | 1.41 | 0.42 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 1 | 8.74 | 1.39 | 0.42 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q8_0 | 1 | 1 | 8.36 | 1.33 | 0.41 | 0.01 | 83ac284 |
M2 Ultra | METAL | base | 1 | 1 | 14.27 | 1.90 | 0.63 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q5_0 | 1 | 1 | 15.50 | 1.90 | 0.65 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q5_1 | 1 | 1 | 15.67 | 1.88 | 0.65 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q8_0 | 1 | 1 | 14.69 | 1.81 | 0.63 | 0.02 | 83ac284 |
M2 Ultra | METAL | small | 1 | 1 | 40.85 | 3.77 | 1.43 | 0.05 | 83ac284 |
M2 Ultra | METAL | small-q5_0 | 1 | 1 | 45.99 | 3.90 | 1.52 | 0.05 | 83ac284 |
M2 Ultra | METAL | small-q5_1 | 1 | 1 | 46.19 | 3.83 | 1.50 | 0.06 | 83ac284 |
M2 Ultra | METAL | small-q8_0 | 1 | 1 | 42.90 | 3.65 | 1.46 | 0.05 | 83ac284 |
M2 Ultra | METAL | medium | 1 | 1 | 109.01 | 7.59 | 3.24 | 0.11 | 83ac284 |
M2 Ultra | METAL | medium-q5_0 | 1 | 1 | 126.78 | 7.55 | 3.45 | 0.13 | 83ac284 |
M2 Ultra | METAL | medium-q5_1 | 1 | 1 | 127.71 | 7.39 | 3.43 | 0.13 | 83ac284 |
M2 Ultra | METAL | medium-q8_0 | 1 | 1 | 115.97 | 7.21 | 3.35 | 0.12 | 83ac284 |
M2 Ultra | METAL | medium-dis | 1 | 1 | 97.74 | 1.06 | 0.36 | 0.01 | 83ac284 |
M2 Ultra | METAL | large-v2 | 1 | 1 | 196.99 | 11.29 | 5.06 | 0.20 | 83ac284 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 1 | 233.88 | 10.83 | 5.56 | 0.24 | 83ac284 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 1 | 234.03 | 10.73 | 5.46 | 0.24 | 83ac284 |
M2 Ultra | METAL | large-v2-q8_0 | 1 | 1 | 210.83 | 10.29 | 5.23 | 0.22 | 83ac284 |
M2 Ultra | METAL | large-v2-dis | 1 | 1 | 175.37 | 1.18 | 0.42 | 0.02 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo | 1 | 1 | 177.35 | 1.85 | 0.73 | 0.03 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 1 | 209.31 | 1.69 | 0.80 | 0.04 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo-q8_0 | 1 | 1 | 189.55 | 1.64 | 0.75 | 0.03 | 83ac284 |
What's Changed
- Added OpenVino init on state by @sandrohanea in #2464
- Updating the Quick start by @stsfaroz in #2475
- max_length from max_target_positions by @CrispStrobe in #2477
- Add dtw preset for large-v3-turbo by @rotemdan in #2481
- make : fix GGML_VULKAN=1 build by @ggerganov in #2485
- Add Vulkan notice in README.md by @toboil-features in #2488
- Fix Ruby binding building by @KitaitiMakoto in #2484
- Update of README.md by @toboil-features in #2489
- whisper: fix index overflow by @Josscii in #2505
- ruby : Add Metal support by @KitaitiMakoto in #2516
- ruby: New segment callback by @KitaitiMakoto in #2506
- ruby : add more APIs by @KitaitiMakoto in #2518
- ruby: fix installation test by @KitaitiMakoto in #2519
- When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step by @jettoblack in #2515
- ci : fix openblas build by @ggerganov in #2511
- whisper : reduce ggml_context usage by @ggerganov in #2525
- sync : ggml by @ggerganov in #2528
- passing samples_padded by ref to the threads. by @vinmisra in #2534
- fix ffmpeg v5 build by @stsydow in #2543
- fix: ggml-vulkan logs by @thewh1teagle in #2547
- Fix the instructions on the Ruby binding by @wilsonsilva in #2548
- whisper.swiftui : add model download list & bench methods by @jhen0409 in #2546
- ruby : Add more API by @KitaitiMakoto in #2551
- Fix building workflow for linux/arm64 container by @rai62 in #2555
- sync : ggml by @ggerganov in #2561
- whisper.swiftui : switch Mac dest to Mac (Designed for iPad) by @jhen0409 in #2562
New Contributors
- @stsfaroz made their first contribution in #2475
- @CrispStrobe made their first contribution in #2477
- @toboil-features made their first contribution in #2488
- @KitaitiMakoto made their first contribution in #2484
- @Josscii made their first contribution in #2505
- @jettoblack made their first contribution in #2515
- @vinmisra made their first contribution in #2534
- @stsydow made their first contribution in #2543
- @wilsonsilva made their first contribution in #2548
- @rai62 made their first contribution in #2555
Full Changelog: v1.7.1...v1.7.2-pre
v1.7.1
Overview
- Fix Vulkan crashes
- Performance stats for Vulkan on RTX 2060
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | VULKAN | tiny | 1 | 0 | 30.38 | 1.37 | 1.04 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_0 | 1 | 0 | 20.98 | 1.38 | 0.99 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_1 | 1 | 0 | 20.74 | 1.30 | 0.96 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | base | 1 | 0 | 44.69 | 1.59 | 1.78 | 0.09 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_0 | 1 | 0 | 39.72 | 2.11 | 1.72 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_1 | 1 | 0 | 39.45 | 2.01 | 1.63 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | small | 1 | 0 | 160.02 | 3.53 | 4.64 | 0.23 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_0 | 1 | 0 | 141.52 | 4.54 | 4.44 | 0.20 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_1 | 1 | 0 | 141.03 | 4.63 | 4.18 | 0.20 | 9f346d0 |
RTX 2060 | VULKAN | medium | 1 | 0 | 472.66 | 7.55 | 11.35 | 0.56 | 9f346d0 |
RTX 2060 | VULKAN | medium-q5_0 | 1 | 0 | 395.55 | 9.81 | 10.64 | 0.49 | 9f346d0 |
RTX 2060 | VULKAN | medium-q5_1 | 1 | 0 | 398.85 | 10.16 | 10.15 | 0.50 | 9f346d0 |
RTX 2060 | VULKAN | medium-dis | 1 | 0 | 427.26 | 1.26 | 1.20 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | large-v2 | 1 | 0 | 924.60 | 12.36 | 18.56 | 1.01 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-q5_0 | 1 | 0 | 774.21 | 17.25 | 17.17 | 0.85 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-q5_1 | 1 | 0 | 779.75 | 17.44 | 16.27 | 0.85 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-dis | 1 | 0 | 833.35 | 1.38 | 1.56 | 0.10 | 9f346d0 |
RTX 2060 | VULKAN | large-v3-turbo | 1 | 0 | 839.90 | 2.11 | 2.70 | 0.16 | 9f346d0 |
RTX 2060 | VULKAN | large-v3-turbo-q5_0 | 1 | 0 | 705.49 | 3.22 | 2.53 | 0.14 | 9f346d0 |
What's Changed
- Retry allocation with fallback flags by @SRHMorris in #2451
New Contributors
- @SRHMorris made their first contribution in #2451
Full Changelog: v1.7.0...v1.7.1
Binaries
https://github.com/ggerganov/whisper.cpp/actions/runs/11213279590
v1.7.0
Overview
- Fix crashes with high number of beams
- Reduce overal VRAM usage
- Optimize Encoder performance
Some performance numbers for this release:
M2 Ultra
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 1 | 8.37 | 1.44 | 0.48 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 1 | 9.81 | 1.46 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 1 | 8.80 | 1.47 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | base | 1 | 1 | 16.11 | 1.96 | 0.74 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_0 | 1 | 1 | 16.38 | 1.99 | 0.78 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_1 | 1 | 1 | 16.72 | 2.00 | 0.77 | 0.02 | 6a94163 |
M2 Ultra | METAL | small | 1 | 1 | 41.26 | 3.88 | 1.66 | 0.05 | 6a94163 |
M2 Ultra | METAL | small-q5_0 | 1 | 1 | 46.91 | 4.02 | 1.76 | 0.06 | 6a94163 |
M2 Ultra | METAL | small-q5_1 | 1 | 1 | 47.05 | 4.00 | 1.73 | 0.06 | 6a94163 |
M2 Ultra | METAL | medium | 1 | 1 | 111.29 | 7.79 | 3.63 | 0.11 | 6a94163 |
M2 Ultra | METAL | medium-q5_0 | 1 | 1 | 129.78 | 7.71 | 3.85 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-q5_1 | 1 | 1 | 129.29 | 7.71 | 3.87 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-dis | 1 | 1 | 99.27 | 1.09 | 0.43 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v2 | 1 | 1 | 198.81 | 11.54 | 5.59 | 0.20 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 1 | 236.18 | 11.12 | 6.11 | 0.24 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 1 | 235.88 | 11.14 | 6.01 | 0.24 | 6a94163 |
M2 Ultra | METAL | large-v2-dis | 1 | 1 | 177.41 | 1.21 | 0.48 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo | 1 | 1 | 178.92 | 1.89 | 0.83 | 0.03 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 1 | 211.44 | 1.73 | 0.90 | 0.04 | 6a94163 |
Flash Attention OFF:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 0 | 10.04 | 1.37 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 0 | 10.02 | 1.36 | 0.53 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 0 | 11.08 | 1.37 | 0.53 | 0.01 | 6a94163 |
M2 Ultra | METAL | base | 1 | 0 | 17.84 | 1.93 | 0.77 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_0 | 1 | 0 | 18.57 | 1.92 | 0.81 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_1 | 1 | 0 | 18.66 | 1.93 | 0.82 | 0.02 | 6a94163 |
M2 Ultra | METAL | small | 1 | 0 | 48.26 | 3.95 | 1.73 | 0.05 | 6a94163 |
M2 Ultra | METAL | small-q5_0 | 1 | 0 | 53.68 | 3.99 | 1.85 | 0.06 | 6a94163 |
M2 Ultra | METAL | small-q5_1 | 1 | 0 | 53.86 | 4.00 | 1.82 | 0.06 | 6a94163 |
M2 Ultra | METAL | medium | 1 | 0 | 130.09 | 8.01 | 3.82 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-q5_0 | 1 | 0 | 148.18 | 7.92 | 4.11 | 0.14 | 6a94163 |
M2 Ultra | METAL | medium-q5_1 | 1 | 0 | 147.95 | 7.94 | 4.11 | 0.14 | 6a94163 |
M2 Ultra | METAL | medium-dis | 1 | 0 | 116.97 | 1.11 | 0.42 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v2 | 1 | 0 | 232.43 | 12.34 | 5.87 | 0.22 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 0 | 269.72 | 11.68 | 6.44 | 0.26 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 0 | 269.71 | 11.82 | 6.36 | 0.26 | 6a94163 |
M2 Ultra | METAL | large-v2-dis | 1 | 0 | 209.25 | 1.25 | 0.48 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo | 1 | 0 | 211.09 | 1.98 | 0.84 | 0.03 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 0 | 244.23 | 1.81 | 0.92 | 0.04 | 6a94163 |
Ryzen 9 5950X + RTX 2060
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | AVX2 CUDA | tiny | 1 | 1 | 7.35 | 0.78 | 0.24 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_0 | 1 | 1 | 6.45 | 0.67 | 0.14 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_1 | 1 | 1 | 6.39 | 0.66 | 0.14 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | base | 1 | 1 | 10.20 | 0.88 | 0.30 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_0 | 1 | 1 | 11.38 | 0.92 | 0.21 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_1 | 1 | 1 | 11.76 | 0.91 | 0.20 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | small | 1 | 1 | 33.06 | 2.00 | 0.56 | 0.03 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_0 | 1 | 1 | 35.84 | 1.84 | 0.43 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_1 | 1 | 1 | 36.89 | 1.82 | 0.42 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium | 1 | 1 | 90.65 | 4.54 | 1.13 | 0.08 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_0 | 1 | 1 | 104.01 | 3.80 | 0.91 | 0.10 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_1 | 1 | 1 | 107.98 | 3.72 | 0.87 | 0.10 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-dis | 1 | 1 | 79.08 | 0.68 | 0.17 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2 | 1 | 1 | 162.00 | 7.52 | 1.92 | 0.14 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 1 | 1 | 184.59 | 5.64 | 1.50 | 0.16 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 1 | 1 | 193.85 | 5.55 | 1.44 | 0.17 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-dis | 1 | 1 | 140.75 | 0.84 | 0.37 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo | 1 | 1 | 143.38 | 1.29 | 0.36 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo-q5_0 | 1 | 1 | 163.30 | 0.93 | 0.28 | 0.03 | 6a94163 |
Flash Attention OFF:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | AVX2 CUDA | tiny | 1 | 0 | 12.49 | 0.87 | 0.23 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_0 | 1 | 0 | 10.65 | 0.78 | 0.19 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_1 | 1 | 0 | 10.82 | 0.77 | 0.19 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base | 1 | 0 | 18.97 | 1.04 | 0.34 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_0 | 1 | 0 | 20.22 | 1.09 | 0.27 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_1 | 1 | 0 | 20.48 | 1.07 | 0.27 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | small | 1 | 0 | 59.52 | 2.37 | 0.70 | 0.05 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_0 | 1 | 0 | 62.98 | 2.23 | 0.60 | 0.06 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_1 | 1 | 0 | 63.64 | 2.21 | 0.59 | 0.06 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium | 1 | 0 | 161.53 | 5.36 | 1.53 | 0.13 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_0 | 1 | 0 | 174.96 | 4.64 | 1.32 | 0.15 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_1 | 1 | 0 | 178.42 | 4.57 | 1.29 | 0.15 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-dis | 1 | 0 | 149.65 | 0.75 | 0.20 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2 | 1 | 0 | 280.55 | 8.74 | 2.51 | 0.23 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 1 | 0 | 306.87 | 6.92 | 2.08 | 0.25 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 1 | 0 | 314.25 | 6.82 | 2.02 | 0.26 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-dis | 1 | 0 | 259.39 | 0.91 | 0.37 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo | 1 | 0 | 261.83 | 1.44 | 0.41 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo-q5_0 | 1 | 0 | 282.99 | 1.09 | 0.33 | 0.04 | 6a94163 |
Vulkan:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | VULKAN | tiny | 1 | 0 | 30.38 | 1.37 | 1.04 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_0 | 1 | 0 | 20.98 | 1.38 | 0.99 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_1 | 1 | 0 | 20.74 | 1.30 | 0.96 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | base | 1 | 0 | 44.69 | 1.59 | 1.78 | 0.09 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_0 | 1 | 0 | 39.72 | 2.11 | 1.72 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_1 | 1 | 0 | 39.45 | 2.01 | 1.63 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | small | 1 | 0 | 160.02 | 3.53 | 4.64 | 0.23 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_0 | 1 | 0 | 141.52 | 4.54 | 4.44 | 0.20 | 9f346d0 |
RTX 2060 | VULKA... |
v1.6.2
Overview
Bugfix when using multiple whisper_state
in parallel: #2182
What's Changed
- Update ruby bindings by @taf2 in #2154
- Update server.cpp by @dvaldivia in #2181
- Revert "whisper : remove extra backend instance (huh?)" by @ggerganov in #2182
New Contributors
- @dvaldivia made their first contribution in #2181
Full Changelog: v1.6.1...v1.6.2