Whisper Service

A GPU-accelerated speech-to-text service leveraging faster-whisper and Ray Serve. This service utilizes OpenAI's Whisper model (large-v3) optimized with faster-whisper for significantly improved transcription speeds and reduced resource consumption, while Ray Serve and FastAPI enable scalable and low-latency real-time processing.

Features

Fast speech-to-text transcription using Whisper large-v3 model
Optimized speech-to-text transcription using Whisper large-v3 model with faster-whisper enhancements
GPU acceleration with CUDA support
Streaming capability
Containerized deployment with Docker

Prerequisites

Docker with NVIDIA Container Runtime installed
NVIDIA GPU with appropriate drivers
Docker Compose
At least 10GB of GPU memory recommended

Quick Start

Pull the Docker image:

docker pull dimadgo/ray_whisper:1

Start the service:

docker-compose up

The service will be available at http://localhost:8033

API Usage

Transcribe Audio

curl -X POST -H "Content-Type: application/octet-stream" --data-binary @your_audio_file.webm http://localhost:8033/

The service accepts audio files in various formats (webm, wav, mp3, etc.) and returns a JSON response with the transcription:

{
    "transcription": "Your transcribed text will appear here"
}

Testing

A test client and sample audio file are provided in the repository:

python test_client.py

Notes

The service uses the Whisper large-v3 model for optimal transcription quality
GPU acceleration is required for reasonable performance
The service uses shared memory of 10.24GB to handle large audio files
Multiple GPU devices can be utilized by adjusting the count parameter in docker-compose.yml

Benefits of Integrating faster-whisper with Ray Serve

By combining faster-whisper with Ray Serve, this service achieves:

High Performance: faster-whisper employs CTranslate2 to re-implement OpenAI's Whisper, offering up to 4x faster transcription speeds while reducing memory usage.
Scalability: Ray Serve provides robust scalability and flexible deployment options, supporting load balancing, parallel processing, and efficient resource utilization.
Real-time Transcription: The synergy enables low-latency processing ideal for live transcription applications.
Ease of Deployment: Leveraging Docker with Ray Serve simplifies containerization, orchestration, and maintenance of the service.
Flexibility: Both faster-whisper and Ray Serve support GPU acceleration and inference optimizations like quantization, making them suitable for diverse environments.

For more details, please refer to:

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.env.example		.env.example
.gitignore		.gitignore
00.ipynb		00.ipynb
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
start.sh		start.sh
test_client.py		test_client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Service

Features

Prerequisites

Quick Start

API Usage

Transcribe Audio

Testing

Notes

Benefits of Integrating faster-whisper with Ray Serve

About

Releases

Packages

Languages

Vexa-ai/whisper_service

Folders and files

Latest commit

History

Repository files navigation

Whisper Service

Features

Prerequisites

Quick Start

API Usage

Transcribe Audio

Testing

Notes

Benefits of Integrating faster-whisper with Ray Serve

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages