EchoInStone is a comprehensive audio processing tool designed to transcribe, diarize, and align speaker segments from audio files with a focus on achieving the most accurate and faithful transcription possible. It supports various audio sources, including YouTube videos and podcasts, and provides a flexible pipeline for processing audio data, prioritizing precision and reliability over speed.
- Transcription: Convert audio files into text using state-of-the-art automatic speech recognition (ASR) model,
Whisper Large v3 Turbo
. - Diarization: Identify and separate different speakers in an audio file with the cutting-edge model,
Pyannote Speaker Diarization 3.1
. - Alignment: Align transcribed text with the corresponding audio segments using a customized algorithm tailored to be highly efficient and faithful to the outputs of Whisper and Pyannote,
SpeakerAlignement
. - Flexible and Extensible Pipeline: Easily integrate new models or processing steps into an orchestrated pipeline,
AudioProcessingOrchestrator
.
Note: The current version of EchoInStone is a preliminary release. Future updates will include more flexible configuration options and enhanced functionality.
- Python 3.11 or higher
- Poetry (dependency management tool)
-
Clone the repository:
git clone https://github.com/jeanjerome/EchoInStone.git cd EchoInStone
-
Install dependencies using Poetry:
poetry install
-
Configure logging (optional):
- The logging configuration is set up to output logs to both the console and a file (
app.log
). You can modify the logging settings inlogging_config.py
.
- The logging configuration is set up to output logs to both the console and a file (
-
Configure Hugging Face Token:
- Add your Hugging Face token to this file. You can obtain a token by following these steps:
- Go to Hugging Face Settings.
- Click on "New token".
- Copy the generated token and paste it into the
EchoInStone/config.py
file as shown below:
# EchoInStone/config.py
# Hugging Face authentication token
HUGGING_FACE_TOKEN = "your_token_here"
To transcribe and diarize a YouTube video, you can run the following command:
poetry run python main.py <audio_input_url>
<audio_input_url>
: The URL of the audio input (YouTube, podcast, or direct audio file).
-
--output_dir
: Directory to save the output files. Default is"results"
.poetry run python main.py <audio_input_url> --output_dir <output_directory>
-
--transcription_output
: Filename for the transcription output. Default is"speaker_transcriptions.json"
.poetry run python main.py <audio_input_url> --transcription_output <output_filename>
-
Transcribe and diarize a YouTube video:
poetry run python main.py "https://www.youtube.com/watch?v=plZRCMx_Jd8"
-
Transcribe and diarize a podcast:
poetry run python main.py "https://radiofrance-podcast.net/podcast09/rss_13957.xml"
-
Transcribe and diarize a direct MP3 file:
poetry run python main.py "https://media.radiofrance-podcast.net/podcast09/25425-13.02.2025-ITEMA_24028677-2025C53905E0006-NET_MFC_D378B90D-D570-44E9-AB5A-F0CC63B05A14-21.mp3"
To run the tests, use the following command:
poetry run pytest
This command will execute all the tests, including BDD tests, to ensure the functionality of the application.
Logging is configured to output messages to both the console and a file (app.log
). You can adjust the logging level and format in the logging_config.py
file.
- Transcription Model: The default transcription model is
openai/whisper-large-v3-turbo
. You can change this by modifying themodel_name
parameter in theWhisperAudioTranscriber
initialization. - Diarization Model: The default diarization model is
pyannote/speaker-diarization-3.1
. You can change this by modifying the model loading code in thePyannoteDiarizer
class.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes and commit them (
git commit -am 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Create a new Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.
- Thanks to the open-source community for the various libraries and models used in this project.
- Special thanks to the contributors and maintainers of the models and tools that make this project possible.
For any questions or suggestions, please open an issue.