Skip to content

Releases: ggml-org/whisper.cpp

v1.7.5

02 Apr 14:34
Compare
Choose a tag to compare

Overview

This is a relatively big update with various build and CI improvements especially for iOS and WASM. There are also some performance gains, especially for the Metal backend and probably for Arm-based devices.

Big shoutout to @danbev for stepping up and completing the maintenance roadmap for this release!

Mobile examples

All mobile examples have been refreshed. The iOS examples specifically are now much easier to build thanks to the new XCFramework workflow. This should simplify significantly integration of whisper.cpp in 3rd party iOS and macOS apps. CoreML build and convert instructions have also been updated.

WASM examples

The WASM examples are now automatically updated on each new commit and hosted in Github Pages at https://ggerganov.github.io/whisper.cpp/. Problems with CORS rules should be resolved.


Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 1 7.82 1.31 0.35 0.01 ad4e350
M2 ULTRA METAL tiny-q5_0 1 1 8.32 1.28 0.37 0.01 ad4e350
M2 ULTRA METAL tiny-q5_1 1 1 8.21 1.28 0.37 0.01 ad4e350
M2 ULTRA METAL tiny-q8_0 1 1 7.97 1.23 0.36 0.01 ad4e350
M2 ULTRA METAL base 1 1 13.96 1.80 0.42 0.02 ad4e350
M2 ULTRA METAL base-q5_0 1 1 15.19 1.75 0.42 0.02 ad4e350
M2 ULTRA METAL base-q5_1 1 1 15.09 1.75 0.42 0.02 ad4e350
M2 ULTRA METAL base-q8_0 1 1 14.45 1.70 0.41 0.02 ad4e350
M2 ULTRA METAL small 1 1 40.08 3.54 0.86 0.05 ad4e350
M2 ULTRA METAL small-q5_0 1 1 45.07 3.51 0.88 0.05 ad4e350
M2 ULTRA METAL small-q5_1 1 1 45.05 3.52 0.88 0.05 ad4e350
M2 ULTRA METAL small-q8_0 1 1 42.04 3.34 0.85 0.05 ad4e350
M2 ULTRA METAL medium 1 1 107.20 7.28 1.79 0.11 ad4e350
M2 ULTRA METAL medium-q5_0 1 1 125.02 6.67 1.83 0.12 ad4e350
M2 ULTRA METAL medium-q5_1 1 1 124.83 6.70 1.84 0.12 ad4e350
M2 ULTRA METAL medium-q8_0 1 1 114.56 6.53 1.79 0.11 ad4e350
M2 ULTRA METAL medium-dis 1 1 95.96 1.01 0.23 0.01 ad4e350
M2 ULTRA METAL large-v2 1 1 194.29 10.57 2.67 0.20 ad4e350
M2 ULTRA METAL large-v2-q5_0 1 1 230.74 9.57 2.73 0.23 ad4e350
M2 ULTRA METAL large-v2-q5_1 1 1 229.97 9.69 2.74 0.23 ad4e350
M2 ULTRA METAL large-v2-q8_0 1 1 208.11 9.37 2.60 0.21 ad4e350
M2 ULTRA METAL large-v2-dis 1 1 172.72 1.12 0.26 0.02 ad4e350
M2 ULTRA METAL large-v3-turbo 1 1 174.46 1.74 0.42 0.03 ad4e350
M2 ULTRA METAL large-v3-turbo-q5_0 1 1 205.78 1.54 0.42 0.04 ad4e350
M2 ULTRA METAL large-v3-turbo-q8_0 1 1 186.33 1.50 0.40 0.03 ad4e350

Flash Attention OFF:

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 0 8.74 1.20 0.36 0.01 ad4e350
M2 ULTRA METAL tiny-q5_0 1 0 10.30 1.15 0.38 0.01 ad4e350
M2 ULTRA METAL tiny-q5_1 1 0 10.71 1.13 0.38 0.01 ad4e350
M2 ULTRA METAL tiny-q8_0 1 0 9.97 1.12 0.37 0.01 ad4e350
M2 ULTRA METAL base 1 0 16.77 1.71 0.44 0.02 ad4e350
M2 ULTRA METAL base-q5_0 1 0 16.92 1.63 0.44 0.02 ad4e350
M2 ULTRA METAL base-q5_1 1 0 16.84 1.63 0.44 0.02 ad4e350
M2 ULTRA METAL base-q8_0 1 0 16.12 1.63 0.44 0.02 ad4e350
M2 ULTRA METAL small 1 0 45.29 3.44 0.92 0.05 ad4e350
M2 ULTRA METAL small-q5_0 1 0 50.43 3.34 0.94 0.06 ad4e350
M2 ULTRA METAL small-q5_1 1 0 50.49 3.35 0.93 0.06 ad4e350
M2 ULTRA METAL small-q8_0 1 0 47.37 3.20 0.91 0.05 ad4e350
M2 ULTRA METAL medium 1 0 122.81 7.39 1.99 0.12 ad4e350
M2 ULTRA METAL medium-q5_0 1 0 140.62 6.73 2.03 0.14 ad4e350
M2 ULTRA METAL medium-q5_1 1 0 140.44 6.74 2.04 0.14 ad4e350
M2 ULTRA METAL medium-q8_0 1 0 131.05 6.54 1.95 0.13 ad4e350
M2 ULTRA METAL medium-dis 1 0 110.95 0.99 0.24 0.02 ad4e350
M2 ULTRA METAL large-v2 1 0 222.19 10.93 3.01 0.21 ad4e350
M2 ULTRA METAL large-v2-q5_0 1 0 258.47 9.75 3.01 0.25 ad4e350
M2 ULTRA METAL large-v2-q5_1 1 0 258.40 9.85 3.01 0.24 ad4e350
M2 ULTRA METAL large-v2-q8_0 1 0 236.68 9.61 2.85 0.23 ad4e350
M2 ULTRA METAL large-v2-dis 1 0 199.28 1.12 0.27 0.02 ad4e350
M2 ULTRA METAL large-v3-turbo 1 0 201.49 1.76 0.45 0.03 ad4e350
M2 ULTRA METAL large-v3-turbo-q5_0 1 0 233.70 1.55 0.46 0.04 ad4e350
M2 ULTRA METAL large-v3-turbo-q8_0 1 0 214.20 1.51 0.44 0.04 ad4e350

M4 Max

Flash Attention ON:

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M4 Max METAL tiny 1 1 15.22 0.89 0.26 0.01 ad4e350
M4 Max METAL tiny-q8_0 1 1 14.70 0.86 0.26 0.01 ad4e350
M4 Max METAL base 1 1 25.33 1.36 0.30 0.02 ad4e350
M4 Max METAL base-q8_0 1 1 21.27 1.31 0.30 0.02 ad4e350
M4 Max METAL small 1 1 58.43 2.78 0.60 0.05 ad4e350
M4 Max METAL small-q8_0 1 1 60.26 2.39 0.60 0.05 ad4e350
M4 Max METAL medium 1 1 169.73 6.03 1.31 0.14 ad4e350
M4 Max METAL medium-q8_0 1 1 176.61 4.99 1.31 0.14 ad4e350
M4 Max METAL large-v2 1 1 316.18 9.60 2.08 0.24 ad4e350
M4 Max METAL large-v2-q8_0 1 1 329.59 7.55 2.08 0.25 ad4e350

Flash Attention OFF:

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M4 Max METAL tiny 1 0 13.12 0.87 0.29 0.01 ad4e350
M4 Max METAL tiny-q8_0 1 0 15.90 0.88 0.31 0.01 ad4e350
M4 Max METAL base 1 0 23.10 1.42 0.34 0.02 ad4e350
M4 Max METAL base-q8_0 1 0 27.25 1.31 0.34 0.02 ad4e350
M4 Max METAL small 1 0 71.76 3.02 0.70 0.06 ad4e350
M4 Max METAL small-q8_0 1 0 73.88 2.60 0.71 0.06 ad4e350
M4 Max METAL medium 1 0 208.22 6.94 1.55 0.16 ad4e350
M4 Max METAL medium-q8_0 1 0 214.65 5.90 1.57 0.17 ad4e350
M4 Max METAL large-v2 1 0 381.72 11.28 2.51 0.29 ad4e350
M4 Max METAL large-v2-q8_0 1 0 394.97 8.90 2.45 0.30 ad4e350

V100

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
V100 AVX2 CUDA tiny 8 1 4.01 0.90 0.25 0.01 ad4e350
V100 AVX2 CUDA tiny-q5_1 8 1 4.12 0.88 0.18 0.01 ad4e350
V100 AVX2 CUDA base 8 1 7.00 1.30 0.35 0.01 ad4e350
V100 AVX2 CUDA base-q5_1 8 1 7.22 1.21 0.26 0.02 ad4e350
V100 AVX2 CUDA small 8 1 18.68 2.39 0.69 0.03 ad4e350
V100 AVX2 CUDA small-q5_1 8 1 19.38 2.32 0.51 0.03 ad4e350
V100 AVX2 CUDA medium 8 1 53.17 5.15 1.45 0.06 ad4e350
V100 AVX2 CUDA medium-q5_...
Read more

b2365

31 Mar 15:04
e153b8e
Compare
Choose a tag to compare
android.java : re-add ggml source updates (#2975)

This commit updates the ggml source to include the new unary and binary
operations. I merged https://github.com/ggerganov/whisper.cpp/pull/2958
which seems to have overwritten the changes to the ggml source which
were added in https://github.com/ggerganov/whisper.cpp/pull/2972.

Sorry about this.

v1.7.4

06 Jan 13:16
8a9ad78
Compare
Choose a tag to compare

Overview

Minor release with mostly build fixes.

What's Changed

New Contributors

Full Changelog: v1.7.3...v1.7.4

v1.7.3

18 Dec 16:15
3de9dee
Compare
Choose a tag to compare

Overview

  • Massive performance improvements for the Metal backend, especially for beams > 1 and for quantized models
  • Reduce hallucinations during silence by @jkarthic in #2629
  • Implement no_speech_thold by @jkarthic in #2625
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra Metal tiny 1 1 7.90 1.26 0.35 0.01 ed733e8
M2 Ultra Metal tiny-q5_0 1 1 8.44 1.23 0.36 0.01 ed733e8
M2 Ultra Metal tiny-q5_1 1 1 8.26 1.27 0.37 0.01 ed733e8
M2 Ultra Metal tiny-q8_0 1 1 8.03 1.21 0.35 0.01 ed733e8
M2 Ultra Metal base 1 1 13.77 1.80 0.42 0.02 ed733e8
M2 Ultra Metal base-q5_0 1 1 15.02 1.72 0.42 0.02 ed733e8
M2 Ultra Metal base-q5_1 1 1 14.93 1.74 0.42 0.02 ed733e8
M2 Ultra Metal base-q8_0 1 1 14.26 1.68 0.41 0.02 ed733e8
M2 Ultra Metal small 1 1 39.76 3.54 0.85 0.05 ed733e8
M2 Ultra Metal small-q5_0 1 1 45.07 3.47 0.87 0.05 ed733e8
M2 Ultra Metal small-q5_1 1 1 44.82 3.49 0.87 0.05 ed733e8
M2 Ultra Metal small-q8_0 1 1 41.79 3.30 0.84 0.05 ed733e8
M2 Ultra Metal medium 1 1 106.73 7.28 1.78 0.11 ed733e8
M2 Ultra Metal medium-q5_0 1 1 124.43 6.63 1.83 0.12 ed733e8
M2 Ultra Metal medium-q5_1 1 1 124.19 6.70 1.84 0.12 ed733e8
M2 Ultra Metal medium-q8_0 1 1 113.88 6.52 1.75 0.11 ed733e8
M2 Ultra Metal medium-dis 1 1 94.97 0.97 0.22 0.01 ed733e8
M2 Ultra Metal large-v2 1 1 193.33 10.53 2.65 0.20 ed733e8
M2 Ultra Metal large-v2-q5_0 1 1 229.22 9.52 2.72 0.23 ed733e8
M2 Ultra Metal large-v2-q5_1 1 1 229.40 9.62 2.73 0.23 ed733e8
M2 Ultra Metal large-v2-q8_0 1 1 207.30 9.36 2.59 0.21 ed733e8
M2 Ultra Metal large-v2-dis 1 1 171.43 1.09 0.25 0.02 ed733e8
M2 Ultra Metal large-v3-turbo 1 1 173.45 1.73 0.41 0.03 ed733e8
M2 Ultra Metal large-v3-turbo-q5_0 1 1 205.52 1.52 0.42 0.04 ed733e8
M2 Ultra Metal large-v3-turbo-q8_0 1 1 185.90 1.48 0.40 0.03 ed733e8

What's Changed

New Contributors

Full Changelog: v1.7.2...v1.7.3

v1.7.3-pre

09 Dec 09:34
ed733e8
Compare
Choose a tag to compare
v1.7.3-pre Pre-release
Pre-release

Overview

Massive performance improvements for the Metal backend, especially for beams > 1. Especially for quantized models.
Setting as "pre-release" since there have been major changes to the build system (now using CMake) and I wan't to gather some feedback about how well the project builds now on various platforms. Please leave comments in the discussion to help fix any remaining issues before the official release.

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra Metal tiny 1 1 7.90 1.26 0.35 0.01 ed733e8
M2 Ultra Metal tiny-q5_0 1 1 8.44 1.23 0.36 0.01 ed733e8
M2 Ultra Metal tiny-q5_1 1 1 8.26 1.27 0.37 0.01 ed733e8
M2 Ultra Metal tiny-q8_0 1 1 8.03 1.21 0.35 0.01 ed733e8
M2 Ultra Metal base 1 1 13.77 1.80 0.42 0.02 ed733e8
M2 Ultra Metal base-q5_0 1 1 15.02 1.72 0.42 0.02 ed733e8
M2 Ultra Metal base-q5_1 1 1 14.93 1.74 0.42 0.02 ed733e8
M2 Ultra Metal base-q8_0 1 1 14.26 1.68 0.41 0.02 ed733e8
M2 Ultra Metal small 1 1 39.76 3.54 0.85 0.05 ed733e8
M2 Ultra Metal small-q5_0 1 1 45.07 3.47 0.87 0.05 ed733e8
M2 Ultra Metal small-q5_1 1 1 44.82 3.49 0.87 0.05 ed733e8
M2 Ultra Metal small-q8_0 1 1 41.79 3.30 0.84 0.05 ed733e8
M2 Ultra Metal medium 1 1 106.73 7.28 1.78 0.11 ed733e8
M2 Ultra Metal medium-q5_0 1 1 124.43 6.63 1.83 0.12 ed733e8
M2 Ultra Metal medium-q5_1 1 1 124.19 6.70 1.84 0.12 ed733e8
M2 Ultra Metal medium-q8_0 1 1 113.88 6.52 1.75 0.11 ed733e8
M2 Ultra Metal medium-dis 1 1 94.97 0.97 0.22 0.01 ed733e8
M2 Ultra Metal large-v2 1 1 193.33 10.53 2.65 0.20 ed733e8
M2 Ultra Metal large-v2-q5_0 1 1 229.22 9.52 2.72 0.23 ed733e8
M2 Ultra Metal large-v2-q5_1 1 1 229.40 9.62 2.73 0.23 ed733e8
M2 Ultra Metal large-v2-q8_0 1 1 207.30 9.36 2.59 0.21 ed733e8
M2 Ultra Metal large-v2-dis 1 1 171.43 1.09 0.25 0.02 ed733e8
M2 Ultra Metal large-v3-turbo 1 1 173.45 1.73 0.41 0.03 ed733e8
M2 Ultra Metal large-v3-turbo-q5_0 1 1 205.52 1.52 0.42 0.04 ed733e8
M2 Ultra Metal large-v3-turbo-q8_0 1 1 185.90 1.48 0.40 0.03 ed733e8

What's Changed

Full Changelog: v1.7.2...v1.7.3-pre

v1.7.2

19 Nov 16:55
6266a9f
Compare
Choose a tag to compare

Overview

  • Various improvements in the Metal backend
  • Fix extra memory usage for large samples
  • Remove limit for ggml_context (i.e. more beams and processors are supported)
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 1 9.51 1.39 0.41 0.01 83ac284
M2 Ultra METAL tiny-q5_0 1 1 9.57 1.41 0.42 0.01 83ac284
M2 Ultra METAL tiny-q5_1 1 1 8.74 1.39 0.42 0.01 83ac284
M2 Ultra METAL tiny-q8_0 1 1 8.36 1.33 0.41 0.01 83ac284
M2 Ultra METAL base 1 1 14.27 1.90 0.63 0.02 83ac284
M2 Ultra METAL base-q5_0 1 1 15.50 1.90 0.65 0.02 83ac284
M2 Ultra METAL base-q5_1 1 1 15.67 1.88 0.65 0.02 83ac284
M2 Ultra METAL base-q8_0 1 1 14.69 1.81 0.63 0.02 83ac284
M2 Ultra METAL small 1 1 40.85 3.77 1.43 0.05 83ac284
M2 Ultra METAL small-q5_0 1 1 45.99 3.90 1.52 0.05 83ac284
M2 Ultra METAL small-q5_1 1 1 46.19 3.83 1.50 0.06 83ac284
M2 Ultra METAL small-q8_0 1 1 42.90 3.65 1.46 0.05 83ac284
M2 Ultra METAL medium 1 1 109.01 7.59 3.24 0.11 83ac284
M2 Ultra METAL medium-q5_0 1 1 126.78 7.55 3.45 0.13 83ac284
M2 Ultra METAL medium-q5_1 1 1 127.71 7.39 3.43 0.13 83ac284
M2 Ultra METAL medium-q8_0 1 1 115.97 7.21 3.35 0.12 83ac284
M2 Ultra METAL medium-dis 1 1 97.74 1.06 0.36 0.01 83ac284
M2 Ultra METAL large-v2 1 1 196.99 11.29 5.06 0.20 83ac284
M2 Ultra METAL large-v2-q5_0 1 1 233.88 10.83 5.56 0.24 83ac284
M2 Ultra METAL large-v2-q5_1 1 1 234.03 10.73 5.46 0.24 83ac284
M2 Ultra METAL large-v2-q8_0 1 1 210.83 10.29 5.23 0.22 83ac284
M2 Ultra METAL large-v2-dis 1 1 175.37 1.18 0.42 0.02 83ac284
M2 Ultra METAL large-v3-turbo 1 1 177.35 1.85 0.73 0.03 83ac284
M2 Ultra METAL large-v3-turbo-q5_0 1 1 209.31 1.69 0.80 0.04 83ac284
M2 Ultra METAL large-v3-turbo-q8_0 1 1 189.55 1.64 0.75 0.03 83ac284

What's Changed

New Contributors

Full Changelog: v1.7.1...v1.7.2

v1.7.2-pre

15 Nov 14:05
f02b40b
Compare
Choose a tag to compare
v1.7.2-pre Pre-release
Pre-release

Overview

This is a pre-release since I think there have been some reports about memory leaks which I haven't had the time to investigate and confirm. If these are resolved in the next days, will add them to the official 1.7.2 release next week.

  • Various improvements in the Metal backend
  • Fix extra memory usage for large samples
  • Remove limit for ggml_context (i.e. more beams and processors are supported)
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 1 9.51 1.39 0.41 0.01 83ac284
M2 Ultra METAL tiny-q5_0 1 1 9.57 1.41 0.42 0.01 83ac284
M2 Ultra METAL tiny-q5_1 1 1 8.74 1.39 0.42 0.01 83ac284
M2 Ultra METAL tiny-q8_0 1 1 8.36 1.33 0.41 0.01 83ac284
M2 Ultra METAL base 1 1 14.27 1.90 0.63 0.02 83ac284
M2 Ultra METAL base-q5_0 1 1 15.50 1.90 0.65 0.02 83ac284
M2 Ultra METAL base-q5_1 1 1 15.67 1.88 0.65 0.02 83ac284
M2 Ultra METAL base-q8_0 1 1 14.69 1.81 0.63 0.02 83ac284
M2 Ultra METAL small 1 1 40.85 3.77 1.43 0.05 83ac284
M2 Ultra METAL small-q5_0 1 1 45.99 3.90 1.52 0.05 83ac284
M2 Ultra METAL small-q5_1 1 1 46.19 3.83 1.50 0.06 83ac284
M2 Ultra METAL small-q8_0 1 1 42.90 3.65 1.46 0.05 83ac284
M2 Ultra METAL medium 1 1 109.01 7.59 3.24 0.11 83ac284
M2 Ultra METAL medium-q5_0 1 1 126.78 7.55 3.45 0.13 83ac284
M2 Ultra METAL medium-q5_1 1 1 127.71 7.39 3.43 0.13 83ac284
M2 Ultra METAL medium-q8_0 1 1 115.97 7.21 3.35 0.12 83ac284
M2 Ultra METAL medium-dis 1 1 97.74 1.06 0.36 0.01 83ac284
M2 Ultra METAL large-v2 1 1 196.99 11.29 5.06 0.20 83ac284
M2 Ultra METAL large-v2-q5_0 1 1 233.88 10.83 5.56 0.24 83ac284
M2 Ultra METAL large-v2-q5_1 1 1 234.03 10.73 5.46 0.24 83ac284
M2 Ultra METAL large-v2-q8_0 1 1 210.83 10.29 5.23 0.22 83ac284
M2 Ultra METAL large-v2-dis 1 1 175.37 1.18 0.42 0.02 83ac284
M2 Ultra METAL large-v3-turbo 1 1 177.35 1.85 0.73 0.03 83ac284
M2 Ultra METAL large-v3-turbo-q5_0 1 1 209.31 1.69 0.80 0.04 83ac284
M2 Ultra METAL large-v3-turbo-q8_0 1 1 189.55 1.64 0.75 0.03 83ac284

What's Changed

New Contributors

Full Changelog: v1.7.1...v1.7.2-pre

v1.7.1

07 Oct 10:09
ebca09a
Compare
Choose a tag to compare

Overview

  • Fix Vulkan crashes
  • Performance stats for Vulkan on RTX 2060
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 VULKAN tiny 1 0 30.38 1.37 1.04 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_0 1 0 20.98 1.38 0.99 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_1 1 0 20.74 1.30 0.96 0.05 9f346d0
RTX 2060 VULKAN base 1 0 44.69 1.59 1.78 0.09 9f346d0
RTX 2060 VULKAN base-q5_0 1 0 39.72 2.11 1.72 0.08 9f346d0
RTX 2060 VULKAN base-q5_1 1 0 39.45 2.01 1.63 0.08 9f346d0
RTX 2060 VULKAN small 1 0 160.02 3.53 4.64 0.23 9f346d0
RTX 2060 VULKAN small-q5_0 1 0 141.52 4.54 4.44 0.20 9f346d0
RTX 2060 VULKAN small-q5_1 1 0 141.03 4.63 4.18 0.20 9f346d0
RTX 2060 VULKAN medium 1 0 472.66 7.55 11.35 0.56 9f346d0
RTX 2060 VULKAN medium-q5_0 1 0 395.55 9.81 10.64 0.49 9f346d0
RTX 2060 VULKAN medium-q5_1 1 0 398.85 10.16 10.15 0.50 9f346d0
RTX 2060 VULKAN medium-dis 1 0 427.26 1.26 1.20 0.08 9f346d0
RTX 2060 VULKAN large-v2 1 0 924.60 12.36 18.56 1.01 9f346d0
RTX 2060 VULKAN large-v2-q5_0 1 0 774.21 17.25 17.17 0.85 9f346d0
RTX 2060 VULKAN large-v2-q5_1 1 0 779.75 17.44 16.27 0.85 9f346d0
RTX 2060 VULKAN large-v2-dis 1 0 833.35 1.38 1.56 0.10 9f346d0
RTX 2060 VULKAN large-v3-turbo 1 0 839.90 2.11 2.70 0.16 9f346d0
RTX 2060 VULKAN large-v3-turbo-q5_0 1 0 705.49 3.22 2.53 0.14 9f346d0

What's Changed

New Contributors

Full Changelog: v1.7.0...v1.7.1

Binaries

https://github.com/ggerganov/whisper.cpp/actions/runs/11213279590

v1.7.0

05 Oct 14:15
6a94163
Compare
Choose a tag to compare

Overview

  • Fix crashes with high number of beams
  • Reduce overal VRAM usage
  • Optimize Encoder performance

Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 1 8.37 1.44 0.48 0.01 6a94163
M2 Ultra METAL tiny-q5_0 1 1 9.81 1.46 0.50 0.01 6a94163
M2 Ultra METAL tiny-q5_1 1 1 8.80 1.47 0.50 0.01 6a94163
M2 Ultra METAL base 1 1 16.11 1.96 0.74 0.02 6a94163
M2 Ultra METAL base-q5_0 1 1 16.38 1.99 0.78 0.02 6a94163
M2 Ultra METAL base-q5_1 1 1 16.72 2.00 0.77 0.02 6a94163
M2 Ultra METAL small 1 1 41.26 3.88 1.66 0.05 6a94163
M2 Ultra METAL small-q5_0 1 1 46.91 4.02 1.76 0.06 6a94163
M2 Ultra METAL small-q5_1 1 1 47.05 4.00 1.73 0.06 6a94163
M2 Ultra METAL medium 1 1 111.29 7.79 3.63 0.11 6a94163
M2 Ultra METAL medium-q5_0 1 1 129.78 7.71 3.85 0.13 6a94163
M2 Ultra METAL medium-q5_1 1 1 129.29 7.71 3.87 0.13 6a94163
M2 Ultra METAL medium-dis 1 1 99.27 1.09 0.43 0.02 6a94163
M2 Ultra METAL large-v2 1 1 198.81 11.54 5.59 0.20 6a94163
M2 Ultra METAL large-v2-q5_0 1 1 236.18 11.12 6.11 0.24 6a94163
M2 Ultra METAL large-v2-q5_1 1 1 235.88 11.14 6.01 0.24 6a94163
M2 Ultra METAL large-v2-dis 1 1 177.41 1.21 0.48 0.02 6a94163
M2 Ultra METAL large-v3-turbo 1 1 178.92 1.89 0.83 0.03 6a94163
M2 Ultra METAL large-v3-turbo-q5_0 1 1 211.44 1.73 0.90 0.04 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 0 10.04 1.37 0.50 0.01 6a94163
M2 Ultra METAL tiny-q5_0 1 0 10.02 1.36 0.53 0.01 6a94163
M2 Ultra METAL tiny-q5_1 1 0 11.08 1.37 0.53 0.01 6a94163
M2 Ultra METAL base 1 0 17.84 1.93 0.77 0.02 6a94163
M2 Ultra METAL base-q5_0 1 0 18.57 1.92 0.81 0.02 6a94163
M2 Ultra METAL base-q5_1 1 0 18.66 1.93 0.82 0.02 6a94163
M2 Ultra METAL small 1 0 48.26 3.95 1.73 0.05 6a94163
M2 Ultra METAL small-q5_0 1 0 53.68 3.99 1.85 0.06 6a94163
M2 Ultra METAL small-q5_1 1 0 53.86 4.00 1.82 0.06 6a94163
M2 Ultra METAL medium 1 0 130.09 8.01 3.82 0.13 6a94163
M2 Ultra METAL medium-q5_0 1 0 148.18 7.92 4.11 0.14 6a94163
M2 Ultra METAL medium-q5_1 1 0 147.95 7.94 4.11 0.14 6a94163
M2 Ultra METAL medium-dis 1 0 116.97 1.11 0.42 0.02 6a94163
M2 Ultra METAL large-v2 1 0 232.43 12.34 5.87 0.22 6a94163
M2 Ultra METAL large-v2-q5_0 1 0 269.72 11.68 6.44 0.26 6a94163
M2 Ultra METAL large-v2-q5_1 1 0 269.71 11.82 6.36 0.26 6a94163
M2 Ultra METAL large-v2-dis 1 0 209.25 1.25 0.48 0.02 6a94163
M2 Ultra METAL large-v3-turbo 1 0 211.09 1.98 0.84 0.03 6a94163
M2 Ultra METAL large-v3-turbo-q5_0 1 0 244.23 1.81 0.92 0.04 6a94163

Ryzen 9 5950X + RTX 2060

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 1 1 7.35 0.78 0.24 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_0 1 1 6.45 0.67 0.14 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_1 1 1 6.39 0.66 0.14 0.01 6a94163
RTX 2060 AVX2 CUDA base 1 1 10.20 0.88 0.30 0.01 6a94163
RTX 2060 AVX2 CUDA base-q5_0 1 1 11.38 0.92 0.21 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_1 1 1 11.76 0.91 0.20 0.02 6a94163
RTX 2060 AVX2 CUDA small 1 1 33.06 2.00 0.56 0.03 6a94163
RTX 2060 AVX2 CUDA small-q5_0 1 1 35.84 1.84 0.43 0.04 6a94163
RTX 2060 AVX2 CUDA small-q5_1 1 1 36.89 1.82 0.42 0.04 6a94163
RTX 2060 AVX2 CUDA medium 1 1 90.65 4.54 1.13 0.08 6a94163
RTX 2060 AVX2 CUDA medium-q5_0 1 1 104.01 3.80 0.91 0.10 6a94163
RTX 2060 AVX2 CUDA medium-q5_1 1 1 107.98 3.72 0.87 0.10 6a94163
RTX 2060 AVX2 CUDA medium-dis 1 1 79.08 0.68 0.17 0.01 6a94163
RTX 2060 AVX2 CUDA large-v2 1 1 162.00 7.52 1.92 0.14 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_0 1 1 184.59 5.64 1.50 0.16 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_1 1 1 193.85 5.55 1.44 0.17 6a94163
RTX 2060 AVX2 CUDA large-v2-dis 1 1 140.75 0.84 0.37 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo 1 1 143.38 1.29 0.36 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo-q5_0 1 1 163.30 0.93 0.28 0.03 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 1 0 12.49 0.87 0.23 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_0 1 0 10.65 0.78 0.19 0.02 6a94163
RTX 2060 AVX2 CUDA tiny-q5_1 1 0 10.82 0.77 0.19 0.02 6a94163
RTX 2060 AVX2 CUDA base 1 0 18.97 1.04 0.34 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_0 1 0 20.22 1.09 0.27 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_1 1 0 20.48 1.07 0.27 0.02 6a94163
RTX 2060 AVX2 CUDA small 1 0 59.52 2.37 0.70 0.05 6a94163
RTX 2060 AVX2 CUDA small-q5_0 1 0 62.98 2.23 0.60 0.06 6a94163
RTX 2060 AVX2 CUDA small-q5_1 1 0 63.64 2.21 0.59 0.06 6a94163
RTX 2060 AVX2 CUDA medium 1 0 161.53 5.36 1.53 0.13 6a94163
RTX 2060 AVX2 CUDA medium-q5_0 1 0 174.96 4.64 1.32 0.15 6a94163
RTX 2060 AVX2 CUDA medium-q5_1 1 0 178.42 4.57 1.29 0.15 6a94163
RTX 2060 AVX2 CUDA medium-dis 1 0 149.65 0.75 0.20 0.02 6a94163
RTX 2060 AVX2 CUDA large-v2 1 0 280.55 8.74 2.51 0.23 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_0 1 0 306.87 6.92 2.08 0.25 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_1 1 0 314.25 6.82 2.02 0.26 6a94163
RTX 2060 AVX2 CUDA large-v2-dis 1 0 259.39 0.91 0.37 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo 1 0 261.83 1.44 0.41 0.04 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo-q5_0 1 0 282.99 1.09 0.33 0.04 6a94163

Vulkan:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 VULKAN tiny 1 0 30.38 1.37 1.04 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_0 1 0 20.98 1.38 0.99 0.05 9f346d0
RTX 2060 VULKAN tiny-q5_1 1 0 20.74 1.30 0.96 0.05 9f346d0
RTX 2060 VULKAN base 1 0 44.69 1.59 1.78 0.09 9f346d0
RTX 2060 VULKAN base-q5_0 1 0 39.72 2.11 1.72 0.08 9f346d0
RTX 2060 VULKAN base-q5_1 1 0 39.45 2.01 1.63 0.08 9f346d0
RTX 2060 VULKAN small 1 0 160.02 3.53 4.64 0.23 9f346d0
RTX 2060 VULKAN small-q5_0 1 0 141.52 4.54 4.44 0.20 9f346d0
RTX 2060 VULKA...
Read more

v1.6.2

27 May 07:36
c7b6988
Compare
Choose a tag to compare

Overview

Bugfix when using multiple whisper_state in parallel: #2182

What's Changed

New Contributors

Full Changelog: v1.6.1...v1.6.2