Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated audio swapping accents over time #7

Open
spike4379 opened this issue Nov 21, 2023 · 3 comments
Open

Generated audio swapping accents over time #7

spike4379 opened this issue Nov 21, 2023 · 3 comments

Comments

@spike4379
Copy link

Love the tts, this is amazing, however I thought I would bring up that despite the clip I use or its format of WAV or MP3, and it being perfect. The generated speech will always move between an american accent or a british.
Is there a known way to label the audio sample as american or british so it knows which it should stick to?

If this topic needs to go elsewhere please let me know.

@kanttouchthis
Copy link
Owner

unfortunately this is an isssue with the underlying model. Try shortening your input audio to 5-9 seconds where the accent is very noticeable, that might help

@erew123
Copy link

erew123 commented Nov 21, 2023

I had to get a pretty good quality, clean sample of someone for it to sound and remain sounding like them. There is a very occasional slip in the audio, but 95%+ sounds good. I also don't think longer clips necessarily give better results (though I've not done much testing on that and kept my samples around the 8-9 second mark).

Other things that may help:

  • Make sure the audio is down sampled to mono, 24000Hz, 16 Bit
  • If you need to do any audio cleaning, do it before you compress it down to the above settings.
  • Ensure the clip you use doesn't have background noises or music on e.g. lots of movies have quiet music when many of the actors are talking. Bad quality audio will have hiss that needs clearing up. The AI will pick this up, even if we dont.
  • Try make your clip one of nice flowing speech, like the included example.wav file.
  • Make sure the clip doesnt start or end with breathy sounds (breathing in/out etc).

I've not tested yet, but, I also wonder if you use an audio clip that was an AI generated audio, will that come out sounding right VS genuine audio of a real person. There could be a law of diminishing returns causing degradation in quality.

My current experience is, the better the sample, the more like the original person, their accent, nuances etc

EDIT - Changed the suggested Hz

@kanttouchthis
Copy link
Owner

the model samples at 24khz mono so that's probably what you want your source audio to be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants