Skip to content

Use OpenAI Whisper for text-to-speech transcription #1

Open
@kevinjosethomas

Description

@kevinjosethomas

Description

The current interface uses the browser's built-in SpeechRecognition object through the react-speech-recognition library. While this is functional, it is not as accurate as Whisper and also often misses out on the first few words.

The ideal solution would be a client-side solution to reduce the load on the server once this service is publicly accessible. However, I would also prefer to not send voice recordings to OpenAI so a locally hosted instance of Whisper on the Flask server might have to be the approach. I attempted to do this by transmitting audio from the client via websockets but it didn't quite work out. The whisper library is not really designed for relatime transcription, but rather uploaded files that are transcribed over time.

The other alternative is to use something like use-whisper and send the voice recordings to OpenAI to transcribe. This would reduce the server load and also make the transcription more reliable. Maybe toggling between two options for privacy might be the go-to solution in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestexpressiveEnglish → ASL technology

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions