Skip to content

Commit 7f00996

Browse files
[doc] update aishell2 u2++ transformer results (#477)
* [doc] update aishell2 u2++ transformer results * Update train_u2++_transformer.yaml Co-authored-by: Binbin Zhang <811364747@qq.com>
1 parent c66e860 commit 7f00996

File tree

2 files changed

+115
-0
lines changed

2 files changed

+115
-0
lines changed

examples/aishell2/s0/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,21 @@
1515
| attention rescoring | 5.39 | 5.78 |
1616
| LM + attention rescoring | 5.35 | 5.73 |
1717

18+
## U2++ Transformer Result
19+
20+
* Feature info: using fbank feature, with cmvn, no speed perturb
21+
* Training info: lr 0.002, batch size 22, 8 gpus, acc_grad 1, 240 epochs, dither 0.0
22+
* Decoding info: ctc_weight 0.1, reverse_weight 0.5, average_num 30
23+
* Git hash: 5a1342312668e7a5abb83aed1e53256819cebf95
24+
* Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/aishell2/20210621_u2pp_transformer_exp.tar.gz
25+
26+
| decoding mode/chunk size | full | 16 |
27+
|---------------------------|-------|-------|
28+
| ctc greedy search | 7.35 | 8.23 |
29+
| ctc prefix beam search | 7.36 | 8.23 |
30+
| attention rescoring | 6.09 | 6.70 |
31+
| LM + attention rescoring | 6.07 | 6.55 |
32+
1833
## Unified Conformer Result
1934

2035
* Feature info: using fbank feature, with cmvn, no speed perturb.
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# network architecture
2+
# encoder related
3+
encoder: transformer
4+
encoder_conf:
5+
output_size: 256 # dimension of attention
6+
attention_heads: 4
7+
linear_units: 2048 # the number of units of position-wise feed forward
8+
num_blocks: 12 # the number of encoder blocks
9+
dropout_rate: 0.1
10+
positional_dropout_rate: 0.1
11+
attention_dropout_rate: 0.0
12+
input_layer: conv2d # encoder architecture type
13+
normalize_before: true
14+
use_dynamic_chunk: true
15+
use_dynamic_left_chunk: false
16+
17+
# decoder related
18+
decoder: bitransformer
19+
decoder_conf:
20+
attention_heads: 4
21+
linear_units: 2048
22+
num_blocks: 3
23+
r_num_blocks: 3
24+
dropout_rate: 0.1
25+
positional_dropout_rate: 0.1
26+
self_attention_dropout_rate: 0.0
27+
src_attention_dropout_rate: 0.0
28+
29+
# hybrid CTC/attention
30+
model_conf:
31+
ctc_weight: 0.3
32+
lsm_weight: 0.1 # label smoothing option
33+
length_normalized_loss: false
34+
reverse_weight: 0.3
35+
# use raw_wav or kaldi feature
36+
raw_wav: true
37+
38+
# feature extraction
39+
collate_conf:
40+
# waveform level config
41+
wav_distortion_conf:
42+
wav_dither: 1.0
43+
wav_distortion_rate: 0.0
44+
distortion_methods: []
45+
speed_perturb: false
46+
feature_extraction_conf:
47+
feature_type: 'fbank'
48+
mel_bins: 80
49+
frame_shift: 10
50+
51+
# feature extraction
52+
collate_conf:
53+
# waveform level config
54+
wav_distortion_conf:
55+
wav_dither: 0.0
56+
wav_distortion_rate: 0.0
57+
distortion_methods: []
58+
speed_perturb: false
59+
feature_extraction_conf:
60+
feature_type: 'fbank'
61+
mel_bins: 80
62+
frame_shift: 10
63+
frame_length: 25
64+
using_pitch: false
65+
# spec level config
66+
# spec_swap: false
67+
feature_dither: 0.0 # add dither [-feature_dither,feature_dither] on fbank feature
68+
spec_aug: true
69+
spec_aug_conf:
70+
warp_for_time: False
71+
num_t_mask: 2
72+
num_f_mask: 2
73+
max_t: 50
74+
max_f: 10
75+
max_w: 80
76+
spec_sub: true
77+
spec_sub_conf:
78+
num_t_sub: 3
79+
max_t: 20
80+
81+
# dataset related
82+
dataset_conf:
83+
max_length: 40960
84+
min_length: 0
85+
batch_type: 'static' # static or dynamic
86+
# the size of batch_size should be set according to your gpu memory size, here we used titan xp gpu whose memory size is 12GB
87+
batch_size: 22
88+
sort: true
89+
90+
grad_clip: 5
91+
accum_grad: 1
92+
max_epoch: 240
93+
log_interval: 100
94+
95+
optim: adam
96+
optim_conf:
97+
lr: 0.002
98+
scheduler: warmuplr # pytorch v1.1.0+ required
99+
scheduler_conf:
100+
warmup_steps: 25000

0 commit comments

Comments
 (0)