code-switch speech have different voice #10

c9412600 · 2020-12-23T03:00:29Z

I used your model. The experiment used the open source biaobei dataset and LJspeech dataset. It synthesized 22000 steps and successfully synthesized Chinese and English mixed speech, but the Chinese audio sound is the voice of biaobei and the English audio sound is the voice of LJspeech.
Is the number of training steps insufficient?
Thanks

Jeevesh8 · 2020-12-23T05:52:57Z

@c9412600 You mean that even after changing speaker_no that you input to the model, for the same sentence, the voice remains unchanged ? Could you attach the audio, if possible ?

c9412600 · 2020-12-23T08:34:56Z

I set speaker_no =0 and lang =0 ,and
lj*.mel.npy/THE PRESIDENT ALMOST COMPLETELY BLOCKED OSWALD'S VIEW OF THE GOVERNOR #3 PRIOR TO THE TIME THE FIRST SHOT STRUCK THE PRESIDENT.|1|1
biaobei*.mel.npy/ta1 ti2 chu1 zhen3 duan4 dan1 biao3 shi4 zi4 ji3 de5 zuo3 xi1 you4 shou3 zhou3 dou1 cuo4 shang1.|0|0
and the wav file is as follows
test.zip

The audio speaker changed in the middle. I would like to ask if the number of training steps is not enough or the parameter setting is wrong.

Jeevesh8 · 2020-12-24T09:20:41Z

@c9412600 It is quite interesting how the voice changed in between. If you loaded pre-trained Tacotron2 weights, as in the repo; you can try training upto 40-60k steps. I don't think there will be much improvement after that.

If you haven't loaded T2 weights, then you'd require more steps.. around as many as mentioned in the paper.

c9412600 · 2020-12-24T09:26:01Z

@Jeevesh8 I don't haven't loaded pre-trained t2 weights.I will continue to train for a longer time, and I will continue to feedback the results later, thank you for your help！

Jeevesh8 · 2020-12-24T09:40:21Z

Thank you for feedback @c9412600 :)

If you want to load pre-trained weights in future, you can just provide t2 checkpoint in --checkpoint_path argument.

c9412600 · 2020-12-25T03:13:52Z

Get it!I will continue to try.

c9412600 · 2020-12-25T08:43:51Z

@Jeevesh8 There is one more thing that I forgot to consult with you. Will my phenomenon occur during your training? Different people’s voices in the same sentence, if not, how do you set up your data set?

Jeevesh8 · 2020-12-28T09:52:29Z

@c9412600 No, this phenomenon, certainly didn't occur during my training. I don't set my dataset in any special way. How frequently did you observe this phenomenon? Like in every audio you generated? Or in only very few ?

c9412600 · 2021-01-04T07:59:22Z

@Jeevesh8 Most of them will have this phenomenon, maybe my data set has only one Chinese speaker and one English speaker. What is the composition of your data set? How many speakers? How many languages?
Thanks

mudong0419 · 2021-07-07T07:08:08Z

@Jeevesh8 Have you solved your problem? ST-CMDS dataset has more speakers, have you tried it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code-switch speech have different voice #10

code-switch speech have different voice #10

c9412600 commented Dec 23, 2020 •

edited

Loading

Jeevesh8 commented Dec 23, 2020

c9412600 commented Dec 23, 2020 •

edited

Loading

Jeevesh8 commented Dec 24, 2020

c9412600 commented Dec 24, 2020

Jeevesh8 commented Dec 24, 2020

c9412600 commented Dec 25, 2020

c9412600 commented Dec 25, 2020

Jeevesh8 commented Dec 28, 2020

c9412600 commented Jan 4, 2021

mudong0419 commented Jul 7, 2021

code-switch speech have different voice #10

code-switch speech have different voice #10

Comments

c9412600 commented Dec 23, 2020 • edited Loading

Jeevesh8 commented Dec 23, 2020

c9412600 commented Dec 23, 2020 • edited Loading

Jeevesh8 commented Dec 24, 2020

c9412600 commented Dec 24, 2020

Jeevesh8 commented Dec 24, 2020

c9412600 commented Dec 25, 2020

c9412600 commented Dec 25, 2020

Jeevesh8 commented Dec 28, 2020

c9412600 commented Jan 4, 2021

mudong0419 commented Jul 7, 2021

c9412600 commented Dec 23, 2020 •

edited

Loading

c9412600 commented Dec 23, 2020 •

edited

Loading