-
Notifications
You must be signed in to change notification settings - Fork 55
code-switch speech have different voice #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@c9412600 You mean that even after changing |
I set speaker_no =0 and lang =0 ,and The audio speaker changed in the middle. I would like to ask if the number of training steps is not enough or the parameter setting is wrong. |
@c9412600 It is quite interesting how the voice changed in between. If you loaded pre-trained Tacotron2 weights, as in the repo; you can try training upto 40-60k steps. I don't think there will be much improvement after that. If you haven't loaded T2 weights, then you'd require more steps.. around as many as mentioned in the paper. |
@Jeevesh8 I don't haven't loaded pre-trained t2 weights.I will continue to train for a longer time, and I will continue to feedback the results later, thank you for your help! |
Thank you for feedback @c9412600 :) If you want to load pre-trained weights in future, you can just provide t2 checkpoint in |
Get it!I will continue to try. |
@Jeevesh8 There is one more thing that I forgot to consult with you. Will my phenomenon occur during your training? Different people’s voices in the same sentence, if not, how do you set up your data set? |
@c9412600 No, this phenomenon, certainly didn't occur during my training. I don't set my dataset in any special way. How frequently did you observe this phenomenon? Like in every audio you generated? Or in only very few ? |
@Jeevesh8 Most of them will have this phenomenon, maybe my data set has only one Chinese speaker and one English speaker. What is the composition of your data set? How many speakers? How many languages? |
@Jeevesh8 Have you solved your problem? ST-CMDS dataset has more speakers, have you tried it? |
I used your model. The experiment used the open source biaobei dataset and LJspeech dataset. It synthesized 22000 steps and successfully synthesized Chinese and English mixed speech, but the Chinese audio sound is the voice of biaobei and the English audio sound is the voice of LJspeech.
Is the number of training steps insufficient?
Thanks
The text was updated successfully, but these errors were encountered: