Skip to content

EchoInStone is an audio processing tool that transcribes, diarizes, and aligns speaker segments from audio files, prioritizing accuracy and reliability.

Notifications You must be signed in to change notification settings

jeanjerome/EchoInStone

Repository files navigation

EchoInStone

EchoInStone is a comprehensive audio processing tool designed to transcribe, diarize, and align speaker segments from audio files with a focus on achieving the most accurate and faithful transcription possible. It supports various audio sources, including YouTube videos and podcasts, and provides a flexible pipeline for processing audio data, prioritizing precision and reliability over speed.

Features

  • Transcription: Convert audio files into text using state-of-the-art automatic speech recognition (ASR) model, Whisper Large v3 Turbo.
  • Diarization: Identify and separate different speakers in an audio file with the cutting-edge model, Pyannote Speaker Diarization 3.1.
  • Alignment: Align transcribed text with the corresponding audio segments using a customized algorithm tailored to be highly efficient and faithful to the outputs of Whisper and Pyannote, SpeakerAlignement.
  • Flexible and Extensible Pipeline: Easily integrate new models or processing steps into an orchestrated pipeline, AudioProcessingOrchestrator.

Note: The current version of EchoInStone is a preliminary release. Future updates will include more flexible configuration options and enhanced functionality.

Installation

Prerequisites

  • Python 3.11 or higher
  • Poetry (dependency management tool)

Steps

  1. Clone the repository:

    git clone https://github.com/jeanjerome/EchoInStone.git
    cd EchoInStone
  2. Install dependencies using Poetry:

    poetry install
  3. Configure logging (optional):

    • The logging configuration is set up to output logs to both the console and a file (app.log). You can modify the logging settings in logging_config.py.
  4. Configure Hugging Face Token:

  • Add your Hugging Face token to this file. You can obtain a token by following these steps:
    1. Go to Hugging Face Settings.
    2. Click on "New token".
    3. Copy the generated token and paste it into the EchoInStone/config.py file as shown below:
# EchoInStone/config.py

# Hugging Face authentication token
HUGGING_FACE_TOKEN = "your_token_here"

Usage

Basic Example

To transcribe and diarize a YouTube video, you can run the following command:

poetry run python main.py <audio_input_url>
  • <audio_input_url>: The URL of the audio input (YouTube, podcast, or direct audio file).

Command-Line Arguments

  • --output_dir: Directory to save the output files. Default is "results".

    poetry run python main.py <audio_input_url> --output_dir <output_directory>
  • --transcription_output: Filename for the transcription output. Default is "speaker_transcriptions.json".

    poetry run python main.py <audio_input_url> --transcription_output <output_filename>

Examples

  • Transcribe and diarize a YouTube video:

    poetry run python main.py "https://www.youtube.com/watch?v=plZRCMx_Jd8"
  • Transcribe and diarize a podcast:

    poetry run python main.py "https://radiofrance-podcast.net/podcast09/rss_13957.xml"
  • Transcribe and diarize a direct MP3 file:

    poetry run python main.py "https://media.radiofrance-podcast.net/podcast09/25425-13.02.2025-ITEMA_24028677-2025C53905E0006-NET_MFC_D378B90D-D570-44E9-AB5A-F0CC63B05A14-21.mp3"

Testing

To run the tests, use the following command:

poetry run pytest

This command will execute all the tests, including BDD tests, to ensure the functionality of the application.

Configuration

Logging

Logging is configured to output messages to both the console and a file (app.log). You can adjust the logging level and format in the logging_config.py file.

Models

  • Transcription Model: The default transcription model is openai/whisper-large-v3-turbo. You can change this by modifying the model_name parameter in the WhisperAudioTranscriber initialization.
  • Diarization Model: The default diarization model is pyannote/speaker-diarization-3.1. You can change this by modifying the model loading code in the PyannoteDiarizer class.

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Make your changes and commit them (git commit -am 'Add new feature').
  4. Push to the branch (git push origin feature-branch).
  5. Create a new Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • Thanks to the open-source community for the various libraries and models used in this project.
  • Special thanks to the contributors and maintainers of the models and tools that make this project possible.

Contact

For any questions or suggestions, please open an issue.

About

EchoInStone is an audio processing tool that transcribes, diarizes, and aligns speaker segments from audio files, prioritizing accuracy and reliability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published