Skip to content

csmcs/voc3训练mb_melgan模型,推理结果异常 #4062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
elliotzheng opened this issue Apr 22, 2025 · 2 comments
Open

csmcs/voc3训练mb_melgan模型,推理结果异常 #4062

elliotzheng opened this issue Apr 22, 2025 · 2 comments
Assignees
Labels

Comments

@elliotzheng
Copy link

elliotzheng commented Apr 22, 2025

General Question

看了一圈提问,极少有人提及vocoder的训练,我这边训练结果异常,想请教下各位。
我用csmcs/voc3里面的run.sh代码训练mb_melgan,参数和代码都没动,baker_alignment_tone.tar.gz也是直接下载官方的,没有自己做mfa,推理的时候采用如下代码
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
FLAGS_allocator_strategy=naive_best_fit
FLAGS_fraction_of_gpu_memory_to_use=0.01
python3 ${BIN_DIR}/../synthesize_e2e.py
--am=speedyspeech_csmsc
--am_config=speedyspeech_csmsc_ckpt_0.2.0/default.yaml
--am_ckpt=speedyspeech_csmsc_ckpt_0.2.0/snapshot_iter_30600.pdz
--am_stat=speedyspeech_csmsc_ckpt_0.2.0/feats_stats.npy
--voc=mb_melgan_csmsc
--voc_config=mb_melgan_train/default.yaml
--voc_ckpt=mb_melgan_train/snapshot_iter_1000000.pdz
--voc_stat=mb_melgan_train/feats_stats.npy
--lang=zh
--text=${BIN_DIR}/../../assets/sentences.txt
--output_dir=${train_output_path}/test_e2e
--phones_dict=dump/phone_id_map.txt
--tones_dict=dump/tone_id_map.txt
fi
上面mb_melgan_train是我训练出来的mb_melgan,speedyspeech模型是官方下载,推理出来的音频如下,声音完全不对:
001_train.zip

如果把上面推理代码里的mb_melgan_train改成官方的mb_melgan_csmsc_ckpt_0.1.1,推理结果正常,音频如下:
001.zip

对比了下最后的eval loss,第一行是官方提供的,第二行是我训练的

Model Step eval/generator_loss eval/log_stft_magnitude_loss eval/spectral_convergence_loss eval/sub_log_stft_magnitude_loss eval/sub_spectral_convergence_loss
default 1(gpu) x 1000000 2.4851 0.71778 0.2761 0.66334 0.2777
mine 1(gpu) x 1000000 3.455973 0.857833 0.469584 0.799196 0.469322

模型文件放在网盘里面了,链接: https://pan.baidu.com/s/1AGMj4Qx3FRUAxJAV_f2Rbg?pwd=j4n8 提取码: j4n8
训练日志在这里,链接: https://pan.baidu.com/s/1Mtz_NKaDf5UWMduTksOkDw?pwd=ppme 提取码: ppme

希望有相关经验的朋友帮我看看,万分感激!

@zxcd zxcd self-assigned this Apr 24, 2025
@zxcd
Copy link
Collaborator

zxcd commented Apr 24, 2025

你如果是用自己数据训练的,可以尝试也把你要用的声学模型finetune一下。

@elliotzheng
Copy link
Author

@zxcd 我用的官方数据,第一步我想先复现官方的结果,所以完全按照example/csmsc/voc3里面教程训练的,训练出来的mb_melgan模型效果完全不能用。另外,声学模型我直接用的官方的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants