docs: use nested contents for easier overview

eginhard · eginhard · commit cd797238f663 · 2024-12-12T15:55:31.000+01:00
diff --git a/docs/source/datasets/formatting_your_dataset.md b/docs/source/datasets/formatting_your_dataset.md
@@ -1,7 +1,9 @@
 (formatting_your_dataset)=
 # Formatting your dataset
 
-For training a TTS model, you need a dataset with speech recordings and transcriptions. The speech must be divided into audio clips and each clip needs transcription.
+For training a TTS model, you need a dataset with speech recordings and
+transcriptions. The speech must be divided into audio clips and each clip needs
+a transcription.
 
 If you have a single audio file and you need to split it into clips, there are different open-source tools for you. We recommend Audacity. It is an open-source and free audio editing software.
 
diff --git a/docs/source/datasets/index.md b/docs/source/datasets/index.md
@@ -0,0 +1,12 @@
+# Datasets
+
+For training a TTS model, you need a dataset with speech recordings and
+transcriptions. See the following pages for more information on:
+
+```{toctree}
+:maxdepth: 1
+
+formatting_your_dataset
+what_makes_a_good_dataset
+tts_datasets
+```
diff --git a/docs/source/datasets/tts_datasets.md b/docs/source/datasets/tts_datasets.md
@@ -1,6 +1,6 @@
-# TTS datasets
+# Public TTS datasets
 
-Some of the known public datasets that we successfully applied 🐸TTS:
+Some of the known public datasets that were successfully used for 🐸TTS:
 
 - [English - LJ Speech](https://keithito.com/LJ-Speech-Dataset/)
 - [English - Nancy](http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/)
diff --git a/docs/source/datasets/what_makes_a_good_dataset.md b/docs/source/datasets/what_makes_a_good_dataset.md
diff --git a/docs/source/extension/implementing_a_new_language_frontend.md b/docs/source/extension/implementing_a_new_language_frontend.md
diff --git a/docs/source/extension/implementing_a_new_model.md b/docs/source/extension/implementing_a_new_model.md
@@ -36,7 +36,7 @@
     There is also the `callback` interface by which you can manipulate both the model and the `Trainer` states. Callbacks give you
     an infinite flexibility to add custom behaviours for your model and training routines.
 
-    For more details, see [BaseTTS](main_classes/model_api.md#base-tts-model) and :obj:`TTS.utils.callbacks`.
+    For more details, see [BaseTTS](../main_classes/model_api.md#base-tts-model) and :obj:`TTS.utils.callbacks`.
 
 6. Optionally, define `MyModelArgs`.
 
diff --git a/docs/source/extension/index.md b/docs/source/extension/index.md
@@ -0,0 +1,14 @@
+# Adding models or languages
+
+You can extend Coqui by implementing new model architectures or adding front
+ends for new languages. See the pages below for more details. The [project
+structure](../project_structure.md) and [contribution
+guidelines](../contributing.md) may also be helpful. Please open a pull request
+with your changes to share back the improvements with the community.
+
+```{toctree}
+:maxdepth: 1
+
+implementing_a_new_model
+implementing_a_new_language_frontend
+```
diff --git a/docs/source/faq.md b/docs/source/faq.md
@@ -7,7 +7,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
 - If you feel like it's a bug to be fixed, then prefer Github issues with the same level of scrutiny.
 
 ## What are the requirements of a good 🐸TTS dataset?
-- [See this page](what_makes_a_good_dataset.md)
+- [See this page](datasets/what_makes_a_good_dataset.md)
 
 ## How should I choose the right model?
 - First, train Tacotron. It is smaller and faster to experiment with. If it performs poorly, try Tacotron2.
@@ -18,7 +18,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
 ## How can I train my own `tts` model?
 0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis.
 
-1. Write your own dataset `formatter` in `datasets/formatters.py` or format your dataset as one of the supported datasets, like LJSpeech.
+1. Write your own dataset `formatter` in `datasets/formatters.py` or [format](datasets/formatting_your_dataset) your dataset as one of the supported datasets, like LJSpeech.
     A `formatter` parses the metadata file and converts a list of training samples.
 
 2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```.
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -4,10 +4,10 @@
 ```
 ----
 
-# Documentation Content
 ```{toctree}
 :maxdepth: 1
 :caption: Get started
+:hidden:
 
 tutorial_for_nervous_beginners
 installation
@@ -20,22 +20,19 @@ contributing
 ```{toctree}
 :maxdepth: 1
 :caption: Using Coqui
+:hidden:
 
 inference
-training_a_model
-finetuning
-implementing_a_new_model
-implementing_a_new_language_frontend
-formatting_your_dataset
-what_makes_a_good_dataset
-tts_datasets
-marytts
+training/index
+extension/index
+datasets/index
 ```
 
 
 ```{toctree}
 :maxdepth: 1
 :caption: Main Classes
+:hidden:
 
 configuration
 main_classes/trainer_api
@@ -50,6 +47,7 @@ main_classes/speaker_manager
 ```{toctree}
 :maxdepth: 1
 :caption: TTS Models
+:hidden:
 
 models/glow_tts.md
 models/vits.md
diff --git a/docs/source/inference.md b/docs/source/inference.md
@@ -86,8 +86,8 @@ tts --model_name "voice_conversion/<language>/<dataset>/<model_name>"
 
 You can boot up a demo 🐸TTS server to run an inference with your models (make
 sure to install the additional dependencies with `pip install coqui-tts[server]`).
-Note that the server is not optimized for performance but gives you an easy way
-to interact with the models.
+Note that the server is not optimized for performance and does not support all
+Coqui models yet.
 
 The demo server provides pretty much the same interface as the CLI command.
 
@@ -192,3 +192,8 @@ api.tts_with_vc_to_file(
     file_path="ouptut.wav"
 )
 ```
+
+```{toctree}
+:hidden:
+marytts
+```
diff --git a/docs/source/training/finetuning.md b/docs/source/training/finetuning.md
@@ -22,7 +22,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
     speech dataset and achieve reasonable results with only a couple of hours of data.
 
     However, note that, fine-tuning does not ensure great results. The model
-    performance still depends on the [dataset quality](what_makes_a_good_dataset.md)
+    performance still depends on the [dataset quality](../datasets/what_makes_a_good_dataset.md)
     and the hyper-parameters you choose for fine-tuning. Therefore,
     it still takes a bit of tinkering.
 
@@ -32,7 +32,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
 1. Setup your dataset.
 
     You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the
-    training. Please see [this page](formatting_your_dataset.md) for more information about formatting.
+    training. Please see [this page](../datasets/formatting_your_dataset.md) for more information about formatting.
 
 2. Choose the model you want to fine-tune.
 
@@ -49,7 +49,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
     You should choose the model based on your requirements. Some models are fast and some are better in speech quality.
     One lazy way to test a model is running the model on the hardware you want to use and see how it works. For
     simple testing, you can use the `tts` command on the terminal. For more info
-    see [here](inference.md).
+    see [here](../inference.md).
 
 3. Download the model.
 
diff --git a/docs/source/training/index.md b/docs/source/training/index.md
@@ -0,0 +1,10 @@
+# Training and fine-tuning
+
+The following pages show you how to train and fine-tune Coqui models:
+
+```{toctree}
+:maxdepth: 1
+
+training_a_model
+finetuning
+```
diff --git a/docs/source/training/training_a_model.md b/docs/source/training/training_a_model.md
@@ -11,11 +11,10 @@
 
 3. Check the recipes.
 
-    Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point for
-    `Nervous Beginners`.
+    Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point.
     A recipe for `GlowTTS` using `LJSpeech` dataset looks like below. Let's be creative and call this `train_glowtts.py`.
 
-    ```{literalinclude} ../../recipes/ljspeech/glow_tts/train_glowtts.py
+    ```{literalinclude} ../../../recipes/ljspeech/glow_tts/train_glowtts.py
     ```
 
     You need to change fields of the `BaseDatasetConfig` to match your dataset and then update `GlowTTSConfig`
@@ -113,7 +112,7 @@
 
     Note that different models have different metrics, visuals and outputs.
 
-    You should also check the [FAQ page](https://github.com/coqui-ai/TTS/wiki/FAQ) for common problems and solutions
+    You should also check the [FAQ page](../faq.md) for common problems and solutions
     that occur in a training.
 
 7. Use your best model for inference.
@@ -142,5 +141,5 @@ d-vectors. For using d-vectors, you first need to compute the d-vectors using th
 
 The same Glow-TTS model above can be trained on a multi-speaker VCTK dataset with the script below.
 
-```{literalinclude} ../../recipes/vctk/glow_tts/train_glow_tts.py
+```{literalinclude} ../../../recipes/vctk/glow_tts/train_glow_tts.py
 ```
diff --git a/docs/source/tutorial_for_nervous_beginners.md b/docs/source/tutorial_for_nervous_beginners.md
@@ -24,10 +24,14 @@ $ tts-server --list_models  # list the available models.
 ```
 ![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif)
 
+See [this page](inference.md) for more details on synthesizing speech with the
+CLI, server or Python API.
 
 ## Training a `tts` Model
 
-A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. See the comments for more details.
+A breakdown of a simple script that trains a GlowTTS model on the LJspeech
+dataset. For a more in-depth guide to training and fine-tuning also see [this
+page](training/index.md).
 
 ### Pure Python Way