EchoInStone

EchoInStone is a comprehensive audio processing tool designed to transcribe, diarize, and align speaker segments from audio files with a focus on achieving the most accurate and faithful transcription possible. It supports various audio sources, including YouTube videos and podcasts, and provides a flexible pipeline for processing audio data, prioritizing precision and reliability over speed.

Features

Transcription: Convert audio files into text using state-of-the-art automatic speech recognition (ASR) model, Whisper Large v3 Turbo.
Diarization: Identify and separate different speakers in an audio file with the cutting-edge model, Pyannote Speaker Diarization 3.1.
Alignment: Align transcribed text with the corresponding audio segments using a customized algorithm tailored to be highly efficient and faithful to the outputs of Whisper and Pyannote, SpeakerAlignement.
Flexible and Extensible Pipeline: Easily integrate new models or processing steps into an orchestrated pipeline, AudioProcessingOrchestrator.

Note: The current version of EchoInStone is a preliminary release. Future updates will include more flexible configuration options and enhanced functionality.

Installation

Prerequisites

Python 3.11 or higher
Poetry (dependency management tool)

Steps

Clone the repository:

git clone https://github.com/jeanjerome/EchoInStone.git
cd EchoInStone

Install dependencies using Poetry:
```
poetry install
```
Configure logging (optional):
- The logging configuration is set up to output logs to both the console and a file (app.log). You can modify the logging settings in logging_config.py.
Configure Hugging Face Token:

Add your Hugging Face token to this file. You can obtain a token by following these steps:
1. Go to Hugging Face Settings.
2. Click on "New token".
3. Copy the generated token and paste it into the EchoInStone/config.py file as shown below:

# EchoInStone/config.py

# Hugging Face authentication token
HUGGING_FACE_TOKEN = "your_token_here"

Usage

Basic Example

To transcribe and diarize a YouTube video, you can run the following command:

poetry run python main.py <audio_input_url>

<audio_input_url>: The URL of the audio input (YouTube, podcast, or direct audio file).

Command-Line Arguments

--output_dir: Directory to save the output files. Default is "results".

poetry run python main.py <audio_input_url> --output_dir <output_directory>

--transcription_output: Filename for the transcription output. Default is "speaker_transcriptions.json".
```
poetry run python main.py <audio_input_url> --transcription_output <output_filename>
```

Examples

Transcribe and diarize a YouTube video:

poetry run python main.py "https://www.youtube.com/watch?v=plZRCMx_Jd8"

Transcribe and diarize a podcast:

poetry run python main.py "https://radiofrance-podcast.net/podcast09/rss_13957.xml"

Transcribe and diarize a direct MP3 file:

poetry run python main.py "https://media.radiofrance-podcast.net/podcast09/25425-13.02.2025-ITEMA_24028677-2025C53905E0006-NET_MFC_D378B90D-D570-44E9-AB5A-F0CC63B05A14-21.mp3"

Testing

To run the tests, use the following command:

poetry run pytest

This command will execute all the tests, including BDD tests, to ensure the functionality of the application.

Configuration

Logging

Logging is configured to output messages to both the console and a file (app.log). You can adjust the logging level and format in the logging_config.py file.

Models

Transcription Model: The default transcription model is openai/whisper-large-v3-turbo. You can change this by modifying the model_name parameter in the WhisperAudioTranscriber initialization.
Diarization Model: The default diarization model is pyannote/speaker-diarization-3.1. You can change this by modifying the model loading code in the PyannoteDiarizer class.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and commit them (git commit -am 'Add new feature').
Push to the branch (git push origin feature-branch).
Create a new Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Thanks to the open-source community for the various libraries and models used in this project.
Special thanks to the contributors and maintainers of the models and tools that make this project possible.

Contact

For any questions or suggestions, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
EchoInStone		EchoInStone
features		features
tests/resources		tests/resources
.gitignore		.gitignore
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EchoInStone

Features

Installation

Prerequisites

Steps

Usage

Basic Example

Command-Line Arguments

Examples

Testing

Configuration

Logging

Models

Contributing

License

Acknowledgments

Contact

About

Releases

Packages

Languages

jeanjerome/EchoInStone

Folders and files

Latest commit

History

Repository files navigation

EchoInStone

Features

Installation

Prerequisites

Steps

Usage

Basic Example

Command-Line Arguments

Examples

Testing

Configuration

Logging

Models

Contributing

License

Acknowledgments

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages