A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books.
- Multiple language and voice support
- Voice blending with customizable weights
- EPUB and TXT file input support
- Standard input (stdin) and
|
piping from other programs - Streaming audio playback
- Split output into chapters
- Adjustable speech speed
- WAV and MP3 output formats
- Chapter merging capability
- Detailed debug output option
- GPU Support
- Add GPU support
- Add PDF support
- Add GUI
- Python 3.12
- Clone the repository:
git clone https://github.com/nazdridoy/kokoro-tts.git
cd kokoro-tts
- Install required packages:
pip install -r requirements.txt
or
uv sync
Note: You can also use uv
as a faster alternative to pip for package installation. (This is a uv project)
- Download the required model files:
# Download either voices.json or voices.bin (bin is preferred)
wget https://github.com/nazdridoy/kokoro-tts/releases/download/v1.0.0/voices-v1.0.bin
# Download the model
wget https://github.com/nazdridoy/kokoro-tts/releases/download/v1.0.0/kokoro-v1.0.onnx
Note: The script will automatically use voices.bin if present, falling back to voices.json if bin is not available.
Category | Voices | Language Code |
---|---|---|
🇺🇸 👩 | af_alloy, af_aoede, af_bella, af_heart, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky | en-us |
🇺🇸 👨 | am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck | en-us |
🇬🇧 | bf_alice, bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_george, bm_lewis | en-gb |
🇫🇷 | ff_siwis | fr-fr |
🇮🇹 | if_sara, im_nicola | it |
🇯🇵 | jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo | ja |
🇨🇳 | zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang | cmn |
Basic usage:
./kokoro-tts <input_text_file> [<output_audio_file>] [options]
-h, --help
: Show help message--help-languages
: List supported languages--help-voices
: List available voices--merge-chunks
: Merge existing chunks into chapter files
--stream
: Stream audio instead of saving to file--speed <float>
: Set speech speed (default: 1.0)--lang <str>
: Set language (default: en-us)--voice <str>
: Set voice or blend voices (default: interactive selection)- Single voice: Use voice name (e.g., "af_sarah")
- Blended voices: Use "voice1:weight,voice2:weight" format
--split-output <dir>
: Save each chunk as separate file in directory--format <str>
: Audio format: wav or mp3 (default: wav)--debug
: Show detailed debug information during processing
.txt
: Text file input.epub
: EPUB book input (will process chapters)
# Basic usage with output file
kokoro-tts input.txt output.wav --speed 1.2 --lang en-us --voice af_sarah
# Read from standard input (stdin)
echo "Hello World" | kokoro-tts /dev/stdin --stream
cat input.txt | kokoro-tts /dev/stdin output.wav
# Use voice blending (60-40 mix)
kokoro-tts input.txt output.wav --voice "af_sarah:60,am_adam:40"
# Use equal voice blend (50-50)
kokoro-tts input.txt --stream --voice "am_adam,af_sarah"
# Process EPUB and split into chunks
kokoro-tts input.epub --split-output ./chunks/ --format mp3
# Stream audio directly
kokoro-tts input.txt --stream --speed 0.8
# Merge existing chunks
kokoro-tts --merge-chunks --split-output ./chunks/ --format wav
# Process EPUB with detailed debug output
kokoro-tts input.epub --split-output ./chunks/ --debug
# List available voices
kokoro-tts --help-voices
# List supported languages
kokoro-tts --help-languages
- Automatically extracts chapters from EPUB files
- Preserves chapter titles and structure
- Creates organized output for each chapter
- Detailed debug output available for troubleshooting
- Chunks long text into manageable segments
- Supports streaming for immediate playback
- Voice blending with customizable mix ratios
- Progress indicators for long processes
- Handles interruptions gracefully
- Single file output
- Split output with chapter organization
- Chunk merging capability
- Multiple audio format support
- Shows detailed information about file processing
- Displays NCX parsing details for EPUB files
- Lists all found chapters and their metadata
- Helps troubleshoot processing issues
- Text file input (.txt)
- EPUB book input (.epub)
- Standard input (stdin)
- Supports piping from other programs
This is a personal project. But if you want to contribute, please feel free to submit a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.