From 0df04cc259c7094f2b0f64841da634045b3f6894 Mon Sep 17 00:00:00 2001 From: Enno Hermann Date: Sat, 14 Dec 2024 15:52:13 +0100 Subject: [PATCH] docs: add notes about xtts fine-tuning --- TTS/bin/synthesize.py | 6 +++--- docs/source/faq.md | 8 +++++++- docs/source/training/finetuning.md | 3 +++ docs/source/training/index.md | 3 +++ docs/source/tutorial_for_nervous_beginners.md | 3 +++ 5 files changed, 19 insertions(+), 4 deletions(-) diff --git a/TTS/bin/synthesize.py b/TTS/bin/synthesize.py index 5fce93b7f4..47b442e266 100755 --- a/TTS/bin/synthesize.py +++ b/TTS/bin/synthesize.py @@ -34,7 +34,7 @@ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2 ``` -#### Single Speaker Models +#### Single speaker models - Run TTS with the default model (`tts_models/en/ljspeech/tacotron2-DDC`): @@ -102,7 +102,7 @@ --vocoder_config_path path/to/vocoder_config.json ``` -#### Multi-speaker Models +#### Multi-speaker models - List the available speakers and choose a `` among them: @@ -125,7 +125,7 @@ --speakers_file_path path/to/speaker.json --speaker_idx ``` -#### Voice Conversion Models +#### Voice conversion models ```sh tts --out_path output/path/speech.wav --model_name "//" \\ diff --git a/docs/source/faq.md b/docs/source/faq.md index 1dd5c1847b..a0eb5bbee4 100644 --- a/docs/source/faq.md +++ b/docs/source/faq.md @@ -16,13 +16,19 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is - If you need faster models, consider SpeedySpeech, GlowTTS or AlignTTS. Keep in mind that SpeedySpeech requires a pre-trained Tacotron or Tacotron2 model to compute text-to-speech alignments. ## How can I train my own `tts` model? + +```{note} XTTS has separate fine-tuning scripts, see [here](models/xtts.md#training). +``` + 0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis. 1. Write your own dataset `formatter` in `datasets/formatters.py` or [format](datasets/formatting_your_dataset) your dataset as one of the supported datasets, like LJSpeech. A `formatter` parses the metadata file and converts a list of training samples. 2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```. - - If you use phonemes for training and your language is supported [here](https://github.com/rhasspy/gruut#supported-languages), you don't need to set your character list. + - If you use phonemes for training and your language is supported by + [Espeak](https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md) + or [Gruut](https://github.com/rhasspy/gruut#supported-languages), you don't need to set your character list. - You can use `TTS/bin/find_unique_chars.py` to get characters used in your dataset. 3. Write your own text cleaner in ```utils.text.cleaners```. It is not always necessary, except when you have a different alphabet or language-specific requirements. diff --git a/docs/source/training/finetuning.md b/docs/source/training/finetuning.md index 1fe54fbcde..fa2ed34a54 100644 --- a/docs/source/training/finetuning.md +++ b/docs/source/training/finetuning.md @@ -29,6 +29,9 @@ them and fine-tune it for your own dataset. This will help you in two main ways: ## Steps to fine-tune a 🐸 TTS model +```{note} XTTS has separate fine-tuning scripts, see [here](../models/xtts.md#training). +``` + 1. Setup your dataset. You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the diff --git a/docs/source/training/index.md b/docs/source/training/index.md index bb76a705df..b09f9cadcb 100644 --- a/docs/source/training/index.md +++ b/docs/source/training/index.md @@ -8,3 +8,6 @@ The following pages show you how to train and fine-tune Coqui models: training_a_model finetuning ``` + +Also see the [XTTS page](../models/xtts.md#training) if you want to fine-tune +that model. diff --git a/docs/source/tutorial_for_nervous_beginners.md b/docs/source/tutorial_for_nervous_beginners.md index a8a64410c4..5e5eac0e0a 100644 --- a/docs/source/tutorial_for_nervous_beginners.md +++ b/docs/source/tutorial_for_nervous_beginners.md @@ -29,6 +29,9 @@ CLI, server or Python API. ## Training a `tts` Model +```{note} XTTS has separate fine-tuning scripts, see [here](models/xtts.md#training). +``` + A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. For a more in-depth guide to training and fine-tuning also see [this page](training/index.md).