feat: improve asr with mp3 and gemini #60
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resolve #54
Audio Transcription Latency Improvement Report
Overview
In educational environments, network stability is often inconsistent, leading to delays in audio transcription processes. While our current goal may not focus on real-time transcription, achieving a round-trip latency of less than 5 seconds (from audio submission to receiving transcription results) is critical for improving the user experience.
Current Test Results
Audio Formats and Sizes:
Latency Results Summary:
Key Observations
Key Improvement Areas
1. Adopt MP3 over WAV
2. Switch to a Faster Transcription Service
Conclusion
Achieving a round-trip latency of under 5 seconds is attainable through improvements in audio format and transcription service. These targeted changes will significantly enhance the user experience, particularly in network-limited educational environments.