-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated audio swapping accents over time #7
Comments
unfortunately this is an isssue with the underlying model. Try shortening your input audio to 5-9 seconds where the accent is very noticeable, that might help |
I had to get a pretty good quality, clean sample of someone for it to sound and remain sounding like them. There is a very occasional slip in the audio, but 95%+ sounds good. I also don't think longer clips necessarily give better results (though I've not done much testing on that and kept my samples around the 8-9 second mark). Other things that may help:
I've not tested yet, but, I also wonder if you use an audio clip that was an AI generated audio, will that come out sounding right VS genuine audio of a real person. There could be a law of diminishing returns causing degradation in quality. My current experience is, the better the sample, the more like the original person, their accent, nuances etc EDIT - Changed the suggested Hz |
the model samples at 24khz mono so that's probably what you want your source audio to be |
Love the tts, this is amazing, however I thought I would bring up that despite the clip I use or its format of WAV or MP3, and it being perfect. The generated speech will always move between an american accent or a british.
Is there a known way to label the audio sample as american or british so it knows which it should stick to?
If this topic needs to go elsewhere please let me know.
The text was updated successfully, but these errors were encountered: