Description
Description
The current interface uses the browser's built-in SpeechRecognition
object through the react-speech-recognition
library. While this is functional, it is not as accurate as Whisper and also often misses out on the first few words.
The ideal solution would be a client-side solution to reduce the load on the server once this service is publicly accessible. However, I would also prefer to not send voice recordings to OpenAI so a locally hosted instance of Whisper on the Flask server might have to be the approach. I attempted to do this by transmitting audio from the client via websockets but it didn't quite work out. The whisper library is not really designed for relatime transcription, but rather uploaded files that are transcribed over time.
The other alternative is to use something like use-whisper
and send the voice recordings to OpenAI to transcribe. This would reduce the server load and also make the transcription more reliable. Maybe toggling between two options for privacy might be the go-to solution in the future.