dataset preprocess #742

CreepJoye · 2025-01-22T10:51:07Z

Checks

This template is only for usage issues encountered.
I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
I have searched for existing issues, including closed ones, and couldn't find a solution.
I confirm that I am using English to submit this report in order to facilitate communication.

Environment Details

python=3.10

Steps to Reproduce

Hi there, thanks for your amazing work on open-sourcing F5-TTS! While training F5 on my own dataset, I noticed an issue: the generated audio quality drops significantly when the dataset contains some audio clips longer than 20 seconds.

I saw there's a slice function in src/f5_tts/train/finetune_gradio.py that can split audio based on silent segments. So, my question is: when working with longer audio clips (e.g., over 20 seconds), do I need to slice them into shorter segments, translate them, and then use them for training? Or is it okay to include a small number of long audio clips directly? I'm planning to use a large dataset later, and slicing and translating all long audio files might be too much work.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

SWivid · 2025-01-27T11:38:44Z

Hi @CreepJoye ,
it's ok to do with audios of <30 total length, mainly for #719

CreepJoye added the help wanted Extra attention is needed label Jan 22, 2025

SWivid closed this as completed Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset preprocess #742

dataset preprocess #742

CreepJoye commented Jan 22, 2025

SWivid commented Jan 27, 2025

dataset preprocess #742

dataset preprocess #742

Comments

CreepJoye commented Jan 22, 2025

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

SWivid commented Jan 27, 2025