Skip to content

Commit cd79723

Browse files
committed
docs: use nested contents for easier overview
1 parent 1c7924f commit cd79723

14 files changed

+70
-26
lines changed

docs/source/formatting_your_dataset.md docs/source/datasets/formatting_your_dataset.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
(formatting_your_dataset)=
22
# Formatting your dataset
33

4-
For training a TTS model, you need a dataset with speech recordings and transcriptions. The speech must be divided into audio clips and each clip needs transcription.
4+
For training a TTS model, you need a dataset with speech recordings and
5+
transcriptions. The speech must be divided into audio clips and each clip needs
6+
a transcription.
57

68
If you have a single audio file and you need to split it into clips, there are different open-source tools for you. We recommend Audacity. It is an open-source and free audio editing software.
79

docs/source/datasets/index.md

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Datasets
2+
3+
For training a TTS model, you need a dataset with speech recordings and
4+
transcriptions. See the following pages for more information on:
5+
6+
```{toctree}
7+
:maxdepth: 1
8+
9+
formatting_your_dataset
10+
what_makes_a_good_dataset
11+
tts_datasets
12+
```

docs/source/tts_datasets.md docs/source/datasets/tts_datasets.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# TTS datasets
1+
# Public TTS datasets
22

3-
Some of the known public datasets that we successfully applied 🐸TTS:
3+
Some of the known public datasets that were successfully used for 🐸TTS:
44

55
- [English - LJ Speech](https://keithito.com/LJ-Speech-Dataset/)
66
- [English - Nancy](http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/)

docs/source/implementing_a_new_model.md docs/source/extension/implementing_a_new_model.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
There is also the `callback` interface by which you can manipulate both the model and the `Trainer` states. Callbacks give you
3737
an infinite flexibility to add custom behaviours for your model and training routines.
3838

39-
For more details, see [BaseTTS](main_classes/model_api.md#base-tts-model) and :obj:`TTS.utils.callbacks`.
39+
For more details, see [BaseTTS](../main_classes/model_api.md#base-tts-model) and :obj:`TTS.utils.callbacks`.
4040

4141
6. Optionally, define `MyModelArgs`.
4242

docs/source/extension/index.md

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Adding models or languages
2+
3+
You can extend Coqui by implementing new model architectures or adding front
4+
ends for new languages. See the pages below for more details. The [project
5+
structure](../project_structure.md) and [contribution
6+
guidelines](../contributing.md) may also be helpful. Please open a pull request
7+
with your changes to share back the improvements with the community.
8+
9+
```{toctree}
10+
:maxdepth: 1
11+
12+
implementing_a_new_model
13+
implementing_a_new_language_frontend
14+
```

docs/source/faq.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
77
- If you feel like it's a bug to be fixed, then prefer Github issues with the same level of scrutiny.
88

99
## What are the requirements of a good 🐸TTS dataset?
10-
- [See this page](what_makes_a_good_dataset.md)
10+
- [See this page](datasets/what_makes_a_good_dataset.md)
1111

1212
## How should I choose the right model?
1313
- First, train Tacotron. It is smaller and faster to experiment with. If it performs poorly, try Tacotron2.
@@ -18,7 +18,7 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
1818
## How can I train my own `tts` model?
1919
0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis.
2020

21-
1. Write your own dataset `formatter` in `datasets/formatters.py` or format your dataset as one of the supported datasets, like LJSpeech.
21+
1. Write your own dataset `formatter` in `datasets/formatters.py` or [format](datasets/formatting_your_dataset) your dataset as one of the supported datasets, like LJSpeech.
2222
A `formatter` parses the metadata file and converts a list of training samples.
2323

2424
2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```.

docs/source/index.md

+7-9
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
```
55
----
66

7-
# Documentation Content
87
```{toctree}
98
:maxdepth: 1
109
:caption: Get started
10+
:hidden:
1111
1212
tutorial_for_nervous_beginners
1313
installation
@@ -20,22 +20,19 @@ contributing
2020
```{toctree}
2121
:maxdepth: 1
2222
:caption: Using Coqui
23+
:hidden:
2324
2425
inference
25-
training_a_model
26-
finetuning
27-
implementing_a_new_model
28-
implementing_a_new_language_frontend
29-
formatting_your_dataset
30-
what_makes_a_good_dataset
31-
tts_datasets
32-
marytts
26+
training/index
27+
extension/index
28+
datasets/index
3329
```
3430

3531

3632
```{toctree}
3733
:maxdepth: 1
3834
:caption: Main Classes
35+
:hidden:
3936
4037
configuration
4138
main_classes/trainer_api
@@ -50,6 +47,7 @@ main_classes/speaker_manager
5047
```{toctree}
5148
:maxdepth: 1
5249
:caption: TTS Models
50+
:hidden:
5351
5452
models/glow_tts.md
5553
models/vits.md

docs/source/inference.md

+7-2
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@ tts --model_name "voice_conversion/<language>/<dataset>/<model_name>"
8686

8787
You can boot up a demo 🐸TTS server to run an inference with your models (make
8888
sure to install the additional dependencies with `pip install coqui-tts[server]`).
89-
Note that the server is not optimized for performance but gives you an easy way
90-
to interact with the models.
89+
Note that the server is not optimized for performance and does not support all
90+
Coqui models yet.
9191

9292
The demo server provides pretty much the same interface as the CLI command.
9393

@@ -192,3 +192,8 @@ api.tts_with_vc_to_file(
192192
file_path="ouptut.wav"
193193
)
194194
```
195+
196+
```{toctree}
197+
:hidden:
198+
marytts
199+
```

docs/source/finetuning.md docs/source/training/finetuning.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
2222
speech dataset and achieve reasonable results with only a couple of hours of data.
2323

2424
However, note that, fine-tuning does not ensure great results. The model
25-
performance still depends on the [dataset quality](what_makes_a_good_dataset.md)
25+
performance still depends on the [dataset quality](../datasets/what_makes_a_good_dataset.md)
2626
and the hyper-parameters you choose for fine-tuning. Therefore,
2727
it still takes a bit of tinkering.
2828

@@ -32,7 +32,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
3232
1. Setup your dataset.
3333

3434
You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the
35-
training. Please see [this page](formatting_your_dataset.md) for more information about formatting.
35+
training. Please see [this page](../datasets/formatting_your_dataset.md) for more information about formatting.
3636

3737
2. Choose the model you want to fine-tune.
3838

@@ -49,7 +49,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
4949
You should choose the model based on your requirements. Some models are fast and some are better in speech quality.
5050
One lazy way to test a model is running the model on the hardware you want to use and see how it works. For
5151
simple testing, you can use the `tts` command on the terminal. For more info
52-
see [here](inference.md).
52+
see [here](../inference.md).
5353

5454
3. Download the model.
5555

docs/source/training/index.md

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Training and fine-tuning
2+
3+
The following pages show you how to train and fine-tune Coqui models:
4+
5+
```{toctree}
6+
:maxdepth: 1
7+
8+
training_a_model
9+
finetuning
10+
```

docs/source/training_a_model.md docs/source/training/training_a_model.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,10 @@
1111

1212
3. Check the recipes.
1313

14-
Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point for
15-
`Nervous Beginners`.
14+
Recipes are located under `TTS/recipes/`. They do not promise perfect models but they provide a good start point.
1615
A recipe for `GlowTTS` using `LJSpeech` dataset looks like below. Let's be creative and call this `train_glowtts.py`.
1716

18-
```{literalinclude} ../../recipes/ljspeech/glow_tts/train_glowtts.py
17+
```{literalinclude} ../../../recipes/ljspeech/glow_tts/train_glowtts.py
1918
```
2019

2120
You need to change fields of the `BaseDatasetConfig` to match your dataset and then update `GlowTTSConfig`
@@ -113,7 +112,7 @@
113112

114113
Note that different models have different metrics, visuals and outputs.
115114

116-
You should also check the [FAQ page](https://github.com/coqui-ai/TTS/wiki/FAQ) for common problems and solutions
115+
You should also check the [FAQ page](../faq.md) for common problems and solutions
117116
that occur in a training.
118117

119118
7. Use your best model for inference.
@@ -142,5 +141,5 @@ d-vectors. For using d-vectors, you first need to compute the d-vectors using th
142141

143142
The same Glow-TTS model above can be trained on a multi-speaker VCTK dataset with the script below.
144143

145-
```{literalinclude} ../../recipes/vctk/glow_tts/train_glow_tts.py
144+
```{literalinclude} ../../../recipes/vctk/glow_tts/train_glow_tts.py
146145
```

docs/source/tutorial_for_nervous_beginners.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,14 @@ $ tts-server --list_models # list the available models.
2424
```
2525
![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif)
2626

27+
See [this page](inference.md) for more details on synthesizing speech with the
28+
CLI, server or Python API.
2729

2830
## Training a `tts` Model
2931

30-
A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. See the comments for more details.
32+
A breakdown of a simple script that trains a GlowTTS model on the LJspeech
33+
dataset. For a more in-depth guide to training and fine-tuning also see [this
34+
page](training/index.md).
3135

3236
### Pure Python Way
3337

0 commit comments

Comments
 (0)