Files

README.md

ESPnet Notebooks

Demo

ASR (Speech recognition)

asr_realtime_demo.ipynb: ASR realtime inference with various pre-trained models.
asr_transfer_learning_demo.ipynb: Demo on how to use pre-trained ASR models for fine-tuning.
streaming_asr_demo.ipynb: Streaming ASR realtime inference with pre-trained models.

SE (Speech enhancement/separation)

se_demo.ipynb: Speech enhancement/separation inference with various pre-trained models.
se_demo_for_waspaa_2021.ipynb: WASPAA2021 version of ESPnet-SE demo.

SLU (Spoken language understanding)

2pass_slu_demo.ipynb: Two pass spoken language understanding pre-trained model examples.

TTS (Text-to-speech)

tts_realtime_demo.ipynb: TTS realtime inference with various pre-trained models.

Other utilities

onnx_conversion_demo.ipynb: How to convert ESPnet models into ONNX format.

ESPnet-EZ

ASR (Speech recognition)

train_from_scratch.ipynb: Training an ASR model with ESPnet-EZ on LibriSpeech-100.
ASR_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on custom dataset.

ST (Speech-to-text translation)

integrate_huggingface.ipynb: Integrating the weakly-supervised model (OWSM) and huggingface's pre-trained language model with ESPnet-EZ on MuST-C-v2.
ST_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on MuST-C-v2.

SLU (Spoken language understanding)

SLU_finetune_owsm.ipynb: Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on SLURP.

TTS (Text-to-speech)

TTS_finetune_vctk_dump.ipynb: Fine-tuning a pre-trained VITS model with ESPnet-EZ on the VCTK dataset.

SVS (Singing voice synthesis)

SVS_finetune_ace-kising.ipynb: Fine-tuning a pre-trained VISinger 2 model with ESPnet-EZ on ACE-KiSing.

Course

CMU SpeechProcessing Spring2023

assignment0_data-prep.ipynb: Course assignment on how to prepare ESPnet-format data.
assignment1_espnet-tutorial.ipynb: A simplified version of previous year's new task tutorial.
assignemnt3_spk.ipynb: Examples of using ESPnet to extract speaker embeddings and conduct speaker recognition.
assignment4_ssl.ipynb: Exploration on using self-supervised speech representation to ESPnet ASR training.
assignment5_st.ipynb: Examples of state-of-the-art speech translation models in ESPnet.
assignment6_slu.ipynb: Examples of state-of-the-art spoken language understanding models in ESPnet.
assignment7_se.ipynb: Examples of state-of-the-art speech enhancement/separation in ESPnet.
assignment8_tts.ipynb: A student version of espnet2-tts realtime demonstration.
s2st_demo.ipynb: An example of existing speech-to-speech translation model for ESPnet.

CMU SpeechRecognition Fall2022

recipe_tutorial.ipynb: A general tutorial of stage-by-stage explanation of ESPnet2 recipes (with new functions).
new_task_tutorial.ipynb: A tutorial on how to add new models/tasks to ESPnet framework.

CMU SpeechRecognition Fall2021

general_tutorial.ipynb: A general tutorial of stage-by-stage explanation of ESPnet2 recipes.

ESPnet1 (Legacy)

asr_library.ipynb: Speech recognition library explanation with network training.
asr_recipe.ipynb: Speech recognition recipe explanation.
pretrained.ipynb: Tutorial on how to use pre-trained models.
st_demo.ipynb: Speech translation demonstration with a TTS model to achieve speech-to-speech translation.
tts_realtime_demo.ipynb: TTS demonstration with different pre-trained TTS models.
tts_recipe.ipynb: Stage explanation for TTS recipes.