This document is serving as a quick lookup table for the community training/finetuning result, with various language support.
The models in this repository are open source and are based on voluntary contributions from contributors.
The use of models must be conditioned on respect for the respective creators. The convenience brought comes from their efforts.
Have a pretrained/finetuned result: model checkpoint (pruned best to facilitate inference, i.e. leave only ema_model_state_dict
) and corresponding vocab file (for tokenization).
Host a public huggingface model repository and upload the model related files.
Make a pull request adding a model card to the current page, i.e. src\f5_tts\infer\SHARED.md
.
F5-TTS v1 v0 Base @ zh & en @ F5-TTS
Model: hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors
# A Variant Model: hf://SWivid/F5-TTS/F5TTS_v1_Base_no_zero_init/model_1250000.safetensors
Vocab: hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt
Config: {" dim" : 1024, " depth" : 22, " heads" : 16, " ff_mult" : 2, " text_dim" : 512, " conv_layers" : 4}
Model: hf://SWivid/F5-TTS/F5TTS_Base/model_1200000.safetensors
Vocab: hf://SWivid/F5-TTS/F5TTS_Base/vocab.txt
Config: {" dim" : 1024, " depth" : 22, " heads" : 16, " ff_mult" : 2, " text_dim" : 512, " text_mask_padding" : False, " conv_layers" : 4, " pe_attn_head" : 1}
Other infos, e.g. Author info, Github repo, Link to some sampled results, Usage instruction, Tutorial (Blog, Video, etc.) ...
F5-TTS Base @ fi @ AsmoKoskinen
Model: hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors
Vocab: hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt
Config: {" dim" : 1024, " depth" : 22, " heads" : 16, " ff_mult" : 2, " text_dim" : 512, " text_mask_padding" : False, " conv_layers" : 4, " pe_attn_head" : 1}
F5-TTS Base @ fr @ RASPIAUDIO
Model: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt
Vocab: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt
Config: {" dim" : 1024, " depth" : 22, " heads" : 16, " ff_mult" : 2, " text_dim" : 512, " text_mask_padding" : False, " conv_layers" : 4, " pe_attn_head" : 1}
F5-TTS Small @ hi @ SPRINGLab
Model: hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors
Vocab: hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt
Config: {" dim" : 768, " depth" : 18, " heads" : 12, " ff_mult" : 2, " text_dim" : 512, " text_mask_padding" : False, " conv_layers" : 4, " pe_attn_head" : 1}
F5-TTS Base @ it @ alien79
Model: hf://alien79/F5-TTS-italian/model_159600.safetensors
Vocab: hf://alien79/F5-TTS-italian/vocab.txt
Config: {" dim" : 1024, " depth" : 22, " heads" : 16, " ff_mult" : 2, " text_dim" : 512, " text_mask_padding" : False, " conv_layers" : 4, " pe_attn_head" : 1}
Model: hf://Jmica/F5TTS/JA_25498980/model_25498980.pt
Vocab: hf://Jmica/F5TTS/JA_25498980/vocab_updated.txt
Config: {" dim" : 1024, " depth" : 22, " heads" : 16, " ff_mult" : 2, " text_dim" : 512, " text_mask_padding" : False, " conv_layers" : 4, " pe_attn_head" : 1}
F5-TTS Base @ ru @ HotDro4illa
Model: hf://hotstone228/F5-TTS-Russian/model_last.safetensors
Vocab: hf://hotstone228/F5-TTS-Russian/vocab.txt
Config: {" dim" : 1024, " depth" : 22, " heads" : 16, " ff_mult" : 2, " text_dim" : 512, " text_mask_padding" : False, " conv_layers" : 4, " pe_attn_head" : 1}
F5-TTS Base @ es @ jpgallegoar
Model
🤗Hugging Face
Data (Hours)
Model License
F5-TTS Base
ckpt & vocab
Voxpopuli & Crowdsourced & TEDx, 218 hours
cc0-1.0
@jpgallegoar GitHub repo , Jupyter Notebook and Gradio usage for Spanish model.