eric102004 commited on
Commit
7f1571e
·
1 Parent(s): 2658ec6

Update model

Browse files
Files changed (23) hide show
  1. README.md +402 -3
  2. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/RESULTS.md +64 -0
  3. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/config.yaml +264 -0
  4. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/acc.png +0 -0
  5. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/backward_time.png +0 -0
  6. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/cer.png +0 -0
  7. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/cer_ctc.png +0 -0
  8. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/clip.png +0 -0
  9. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/forward_time.png +0 -0
  10. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/gpu_max_alloc_mem_GB.png +0 -0
  11. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/gpu_max_cached_mem_GB.png +0 -0
  12. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/grad_norm.png +0 -0
  13. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/iter_time.png +0 -0
  14. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/loss.png +0 -0
  15. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/loss_att.png +0 -0
  16. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/loss_ctc.png +0 -0
  17. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/loss_scale.png +0 -0
  18. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/optim0_lr0.png +0 -0
  19. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/optim_step_time.png +0 -0
  20. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/train_time.png +0 -0
  21. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/wer.png +0 -0
  22. exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/valid.cer.ave_10best.pth +3 -0
  23. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,402 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - myst_ogi_cmu_kids
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/myst_ogi_cmu_kids_wavlm_aed`
15
+
16
+ This model was trained by eric102004 using myst_ogi_cmu_kids recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout 6f722aee1f9593572d5eddfd8cac7075b07cf9ca
26
+ pip install -e .
27
+ cd egs2/myst_ogi_cmu_kids/asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/myst_ogi_cmu_kids_wavlm_aed
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Tue Feb 18 10:11:23 CST 2025`
35
+ - python version: `3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]`
36
+ - espnet version: `espnet 202412`
37
+ - pytorch version: `pytorch 2.4.0`
38
+ - Git hash: `6f722aee1f9593572d5eddfd8cac7075b07cf9ca`
39
+ - Commit date: `Thu Feb 6 22:32:07 2025 -0600`
40
+
41
+ ## exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/decode_asr_asr_model_valid.cer.ave_10best
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |data_cmu/dev|237|2170|91.7|6.5|1.8|2.5|10.9|45.1|
47
+ |data_cmu/test|475|4287|90.2|7.5|2.3|2.0|11.8|47.8|
48
+ |data_jibo/dev|853|853|29.8|70.2|0.0|189.4|259.7|88.0|
49
+ |data_jibo/test|1044|1043|29.4|70.6|0.0|259.6|330.2|86.6|
50
+ |data_myst/dev|9037|153273|90.8|6.1|3.1|2.5|11.7|62.7|
51
+ |data_myst/test|10311|182712|88.7|6.8|4.5|3.0|14.2|61.9|
52
+ |data_ogi_scripted/dev|5426|15375|93.1|5.8|1.1|0.5|7.4|13.2|
53
+ |data_ogi_scripted/test|15945|45419|90.8|7.9|1.3|1.2|10.4|17.7|
54
+ |data_ogi_spon/dev|349|13561|82.3|10.7|7.1|3.5|21.3|95.4|
55
+ |data_ogi_spon/test|1095|38811|83.4|10.5|6.1|4.3|20.9|95.5|
56
+
57
+ ### CER
58
+
59
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
60
+ |---|---|---|---|---|---|---|---|---|
61
+ |data_cmu/dev|237|11449|96.8|1.2|2.0|2.4|5.6|45.1|
62
+ |data_cmu/test|475|22664|95.7|1.3|3.0|2.0|6.4|47.8|
63
+ |data_jibo/dev|853|2014|65.7|31.6|2.7|423.9|458.2|88.0|
64
+ |data_jibo/test|1044|2767|69.1|27.9|3.0|516.7|547.6|86.6|
65
+ |data_myst/dev|9037|763728|95.7|1.3|3.0|2.4|6.8|62.7|
66
+ |data_myst/test|10311|911898|94.0|1.6|4.4|3.0|8.9|61.9|
67
+ |data_ogi_scripted/dev|5426|83141|96.6|1.8|1.7|0.9|4.3|13.2|
68
+ |data_ogi_scripted/test|15945|244467|95.4|2.4|2.2|1.6|6.2|17.7|
69
+ |data_ogi_spon/dev|349|58255|89.1|3.1|7.8|4.4|15.3|95.4|
70
+ |data_ogi_spon/test|1095|165977|90.6|2.9|6.6|5.0|14.5|95.5|
71
+
72
+ ### TER
73
+
74
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
75
+ |---|---|---|---|---|---|---|---|---|
76
+ ## exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/decode_asr_jibo_asr_model_valid.cer.ave_10best
77
+ ### WER
78
+
79
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
80
+ |---|---|---|---|---|---|---|---|---|
81
+ |data_jibo/dev|853|853|12.3|87.7|0.0|1.1|88.7|88.3|
82
+ |data_jibo/test|1044|1043|10.6|89.4|0.0|1.7|91.1|90.3|
83
+
84
+ ### CER
85
+
86
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
87
+ |---|---|---|---|---|---|---|---|---|
88
+ |data_jibo/dev|853|2014|22.1|22.7|55.2|1.4|79.3|88.3|
89
+ |data_jibo/test|1044|2767|21.3|20.6|58.1|2.1|80.8|90.3|
90
+
91
+ ### TER
92
+
93
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
94
+ |---|---|---|---|---|---|---|---|---|
95
+
96
+ ## ASR config
97
+
98
+ <details><summary>expand</summary>
99
+
100
+ ```
101
+ config: conf/tuning/wavlm/train_asr_wavlm_transformer_lr03.yaml
102
+ print_config: false
103
+ log_level: INFO
104
+ drop_last_iter: false
105
+ dry_run: false
106
+ iterator_type: sequence
107
+ valid_iterator_type: null
108
+ output_dir: exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter
109
+ ngpu: 1
110
+ seed: 2022
111
+ num_workers: 4
112
+ num_att_plot: 0
113
+ dist_backend: nccl
114
+ dist_init_method: env://
115
+ dist_world_size: null
116
+ dist_rank: null
117
+ local_rank: 0
118
+ dist_master_addr: null
119
+ dist_master_port: null
120
+ dist_launcher: null
121
+ multiprocessing_distributed: false
122
+ unused_parameters: false
123
+ sharded_ddp: false
124
+ use_deepspeed: false
125
+ deepspeed_config: null
126
+ static_graph: false
127
+ gradient_as_bucket_view: false
128
+ broadcast_buffers: true
129
+ bucket_cap_mb: 25
130
+ compress_gradients: false
131
+ cudnn_enabled: true
132
+ cudnn_benchmark: false
133
+ cudnn_deterministic: false
134
+ use_tf32: false
135
+ collect_stats: false
136
+ write_collected_feats: false
137
+ max_epoch: 40
138
+ patience: null
139
+ val_scheduler_criterion:
140
+ - valid
141
+ - loss
142
+ early_stopping_criterion:
143
+ - valid
144
+ - loss
145
+ - min
146
+ best_model_criterion:
147
+ - - valid
148
+ - cer
149
+ - min
150
+ keep_nbest_models: 10
151
+ nbest_averaging_interval: 0
152
+ grad_clip: 5.0
153
+ grad_clip_type: 2.0
154
+ grad_noise: false
155
+ accum_grad: 16
156
+ no_forward_run: false
157
+ resume: true
158
+ train_dtype: float32
159
+ use_amp: true
160
+ log_interval: 400
161
+ use_matplotlib: true
162
+ use_tensorboard: true
163
+ create_graph_in_tensorboard: false
164
+ use_wandb: false
165
+ wandb_project: null
166
+ wandb_id: null
167
+ wandb_entity: null
168
+ wandb_name: null
169
+ wandb_model_log_interval: -1
170
+ detect_anomaly: false
171
+ use_adapter: false
172
+ adapter: lora
173
+ save_strategy: all
174
+ adapter_conf: {}
175
+ pretrain_path: null
176
+ init_param: []
177
+ ignore_init_mismatch: false
178
+ freeze_param:
179
+ - frontend.upstream
180
+ num_iters_per_epoch: null
181
+ batch_size: 20
182
+ valid_batch_size: null
183
+ batch_bins: 4000000
184
+ valid_batch_bins: null
185
+ category_sample_size: 10
186
+ train_shape_file:
187
+ - exp/asr_stats_raw_en_char/train/speech_shape
188
+ - exp/asr_stats_raw_en_char/train/text_shape.char
189
+ valid_shape_file:
190
+ - exp/asr_stats_raw_en_char/valid/speech_shape
191
+ - exp/asr_stats_raw_en_char/valid/text_shape.char
192
+ batch_type: numel
193
+ valid_batch_type: null
194
+ fold_length:
195
+ - 80000
196
+ - 150
197
+ sort_in_batch: descending
198
+ shuffle_within_batch: false
199
+ sort_batch: descending
200
+ multiple_iterator: false
201
+ validate_each_iter_factory: true
202
+ chunk_length: 500
203
+ chunk_shift_ratio: 0.5
204
+ num_cache_chunks: 1024
205
+ chunk_excluded_key_prefixes: []
206
+ chunk_default_fs: null
207
+ chunk_max_abs_length: null
208
+ chunk_discard_short_samples: true
209
+ train_data_path_and_name_and_type:
210
+ - - dump/raw/train/wav.scp
211
+ - speech
212
+ - sound
213
+ - - dump/raw/train/text
214
+ - text
215
+ - text
216
+ valid_data_path_and_name_and_type:
217
+ - - dump/raw/dev/wav.scp
218
+ - speech
219
+ - sound
220
+ - - dump/raw/dev/text
221
+ - text
222
+ - text
223
+ multi_task_dataset: false
224
+ allow_variable_data_keys: false
225
+ max_cache_size: 0.0
226
+ max_cache_fd: 32
227
+ allow_multi_rates: false
228
+ valid_max_cache_size: null
229
+ exclude_weight_decay: false
230
+ exclude_weight_decay_conf: {}
231
+ optim: adam
232
+ optim_conf:
233
+ lr: 0.03
234
+ weight_decay: 1.0e-06
235
+ scheduler: warmuplr
236
+ scheduler_conf:
237
+ warmup_steps: 15000
238
+ token_list:
239
+ - <blank>
240
+ - <unk>
241
+ - <space>
242
+ - E
243
+ - T
244
+ - A
245
+ - O
246
+ - I
247
+ - N
248
+ - H
249
+ - S
250
+ - R
251
+ - L
252
+ - D
253
+ - U
254
+ - W
255
+ - M
256
+ - C
257
+ - G
258
+ - Y
259
+ - B
260
+ - P
261
+ - F
262
+ - K
263
+ - ''''
264
+ - V
265
+ - X
266
+ - J
267
+ - Z
268
+ - Q
269
+ - ','
270
+ - '-'
271
+ - <sos/eos>
272
+ init: null
273
+ input_size: null
274
+ ctc_conf:
275
+ dropout_rate: 0.0
276
+ ctc_type: builtin
277
+ reduce: true
278
+ ignore_nan_grad: null
279
+ zero_infinity: true
280
+ brctc_risk_strategy: exp
281
+ brctc_group_strategy: end
282
+ brctc_risk_factor: 0.0
283
+ joint_net_conf: null
284
+ use_preprocessor: true
285
+ use_lang_prompt: false
286
+ use_nlp_prompt: false
287
+ token_type: char
288
+ bpemodel: null
289
+ non_linguistic_symbols: null
290
+ cleaner: null
291
+ g2p: null
292
+ speech_volume_normalize: null
293
+ rir_scp: null
294
+ rir_apply_prob: 1.0
295
+ noise_scp: null
296
+ noise_apply_prob: 1.0
297
+ noise_db_range: '13_15'
298
+ short_noise_thres: 0.5
299
+ aux_ctc_tasks: []
300
+ frontend: s3prl
301
+ frontend_conf:
302
+ frontend_conf:
303
+ upstream: wavlm_large
304
+ download_dir: ./hub
305
+ multilayer_feature: true
306
+ fs: 16k
307
+ specaug: specaug
308
+ specaug_conf:
309
+ apply_time_warp: true
310
+ time_warp_window: 5
311
+ time_warp_mode: bicubic
312
+ apply_freq_mask: true
313
+ freq_mask_width_range:
314
+ - 0
315
+ - 27
316
+ num_freq_mask: 2
317
+ apply_time_mask: true
318
+ time_mask_width_ratio_range:
319
+ - 0.0
320
+ - 0.05
321
+ num_time_mask: 5
322
+ normalize: utterance_mvn
323
+ normalize_conf: {}
324
+ model: espnet
325
+ model_conf:
326
+ ctc_weight: 0.3
327
+ lsm_weight: 0.1
328
+ length_normalized_loss: false
329
+ extract_feats_in_collect_stats: false
330
+ preencoder: linear
331
+ preencoder_conf:
332
+ input_size: 1024
333
+ output_size: 80
334
+ encoder: transformer
335
+ encoder_conf:
336
+ output_size: 256
337
+ attention_heads: 4
338
+ linear_units: 1024
339
+ num_blocks: 12
340
+ dropout_rate: 0.1
341
+ positional_dropout_rate: 0.1
342
+ attention_dropout_rate: 0.1
343
+ input_layer: conv2d2
344
+ normalize_before: true
345
+ postencoder: null
346
+ postencoder_conf: {}
347
+ decoder: transformer
348
+ decoder_conf:
349
+ attention_heads: 4
350
+ linear_units: 2048
351
+ num_blocks: 6
352
+ dropout_rate: 0.1
353
+ positional_dropout_rate: 0.1
354
+ self_attention_dropout_rate: 0.1
355
+ src_attention_dropout_rate: 0.1
356
+ preprocessor: default
357
+ preprocessor_conf: {}
358
+ masker: null
359
+ masker_conf: {}
360
+ required:
361
+ - output_dir
362
+ - token_list
363
+ version: '202412'
364
+ distributed: false
365
+ ```
366
+
367
+ </details>
368
+
369
+
370
+
371
+ ### Citing ESPnet
372
+
373
+ ```BibTex
374
+ @inproceedings{watanabe2018espnet,
375
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
376
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
377
+ year={2018},
378
+ booktitle={Proceedings of Interspeech},
379
+ pages={2207--2211},
380
+ doi={10.21437/Interspeech.2018-1456},
381
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
382
+ }
383
+
384
+
385
+
386
+
387
+
388
+
389
+ ```
390
+
391
+ or arXiv:
392
+
393
+ ```bibtex
394
+ @misc{watanabe2018espnet,
395
+ title={ESPnet: End-to-End Speech Processing Toolkit},
396
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
397
+ year={2018},
398
+ eprint={1804.00015},
399
+ archivePrefix={arXiv},
400
+ primaryClass={cs.CL}
401
+ }
402
+ ```
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/RESULTS.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Tue Feb 18 10:11:23 CST 2025`
5
+ - python version: `3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202412`
7
+ - pytorch version: `pytorch 2.4.0`
8
+ - Git hash: `6f722aee1f9593572d5eddfd8cac7075b07cf9ca`
9
+ - Commit date: `Thu Feb 6 22:32:07 2025 -0600`
10
+
11
+ ## exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/decode_asr_asr_model_valid.cer.ave_10best
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |data_cmu/dev|237|2170|91.7|6.5|1.8|2.5|10.9|45.1|
17
+ |data_cmu/test|475|4287|90.2|7.5|2.3|2.0|11.8|47.8|
18
+ |data_jibo/dev|853|853|29.8|70.2|0.0|189.4|259.7|88.0|
19
+ |data_jibo/test|1044|1043|29.4|70.6|0.0|259.6|330.2|86.6|
20
+ |data_myst/dev|9037|153273|90.8|6.1|3.1|2.5|11.7|62.7|
21
+ |data_myst/test|10311|182712|88.7|6.8|4.5|3.0|14.2|61.9|
22
+ |data_ogi_scripted/dev|5426|15375|93.1|5.8|1.1|0.5|7.4|13.2|
23
+ |data_ogi_scripted/test|15945|45419|90.8|7.9|1.3|1.2|10.4|17.7|
24
+ |data_ogi_spon/dev|349|13561|82.3|10.7|7.1|3.5|21.3|95.4|
25
+ |data_ogi_spon/test|1095|38811|83.4|10.5|6.1|4.3|20.9|95.5|
26
+
27
+ ### CER
28
+
29
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
30
+ |---|---|---|---|---|---|---|---|---|
31
+ |data_cmu/dev|237|11449|96.8|1.2|2.0|2.4|5.6|45.1|
32
+ |data_cmu/test|475|22664|95.7|1.3|3.0|2.0|6.4|47.8|
33
+ |data_jibo/dev|853|2014|65.7|31.6|2.7|423.9|458.2|88.0|
34
+ |data_jibo/test|1044|2767|69.1|27.9|3.0|516.7|547.6|86.6|
35
+ |data_myst/dev|9037|763728|95.7|1.3|3.0|2.4|6.8|62.7|
36
+ |data_myst/test|10311|911898|94.0|1.6|4.4|3.0|8.9|61.9|
37
+ |data_ogi_scripted/dev|5426|83141|96.6|1.8|1.7|0.9|4.3|13.2|
38
+ |data_ogi_scripted/test|15945|244467|95.4|2.4|2.2|1.6|6.2|17.7|
39
+ |data_ogi_spon/dev|349|58255|89.1|3.1|7.8|4.4|15.3|95.4|
40
+ |data_ogi_spon/test|1095|165977|90.6|2.9|6.6|5.0|14.5|95.5|
41
+
42
+ ### TER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ ## exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/decode_asr_jibo_asr_model_valid.cer.ave_10best
47
+ ### WER
48
+
49
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
50
+ |---|---|---|---|---|---|---|---|---|
51
+ |data_jibo/dev|853|853|12.3|87.7|0.0|1.1|88.7|88.3|
52
+ |data_jibo/test|1044|1043|10.6|89.4|0.0|1.7|91.1|90.3|
53
+
54
+ ### CER
55
+
56
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
57
+ |---|---|---|---|---|---|---|---|---|
58
+ |data_jibo/dev|853|2014|22.1|22.7|55.2|1.4|79.3|88.3|
59
+ |data_jibo/test|1044|2767|21.3|20.6|58.1|2.1|80.8|90.3|
60
+
61
+ ### TER
62
+
63
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
64
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/config.yaml ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/wavlm/train_asr_wavlm_transformer_lr03.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter
9
+ ngpu: 1
10
+ seed: 2022
11
+ num_workers: 4
12
+ num_att_plot: 0
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: null
16
+ dist_rank: null
17
+ local_rank: 0
18
+ dist_master_addr: null
19
+ dist_master_port: null
20
+ dist_launcher: null
21
+ multiprocessing_distributed: false
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ use_deepspeed: false
25
+ deepspeed_config: null
26
+ static_graph: false
27
+ gradient_as_bucket_view: false
28
+ broadcast_buffers: true
29
+ bucket_cap_mb: 25
30
+ compress_gradients: false
31
+ cudnn_enabled: true
32
+ cudnn_benchmark: false
33
+ cudnn_deterministic: false
34
+ use_tf32: false
35
+ collect_stats: false
36
+ write_collected_feats: false
37
+ max_epoch: 40
38
+ patience: null
39
+ val_scheduler_criterion:
40
+ - valid
41
+ - loss
42
+ early_stopping_criterion:
43
+ - valid
44
+ - loss
45
+ - min
46
+ best_model_criterion:
47
+ - - valid
48
+ - cer
49
+ - min
50
+ keep_nbest_models: 10
51
+ nbest_averaging_interval: 0
52
+ grad_clip: 5.0
53
+ grad_clip_type: 2.0
54
+ grad_noise: false
55
+ accum_grad: 16
56
+ no_forward_run: false
57
+ resume: true
58
+ train_dtype: float32
59
+ use_amp: true
60
+ log_interval: 400
61
+ use_matplotlib: true
62
+ use_tensorboard: true
63
+ create_graph_in_tensorboard: false
64
+ use_wandb: false
65
+ wandb_project: null
66
+ wandb_id: null
67
+ wandb_entity: null
68
+ wandb_name: null
69
+ wandb_model_log_interval: -1
70
+ detect_anomaly: false
71
+ use_adapter: false
72
+ adapter: lora
73
+ save_strategy: all
74
+ adapter_conf: {}
75
+ pretrain_path: null
76
+ init_param: []
77
+ ignore_init_mismatch: false
78
+ freeze_param:
79
+ - frontend.upstream
80
+ num_iters_per_epoch: null
81
+ batch_size: 20
82
+ valid_batch_size: null
83
+ batch_bins: 4000000
84
+ valid_batch_bins: null
85
+ category_sample_size: 10
86
+ train_shape_file:
87
+ - exp/asr_stats_raw_en_char/train/speech_shape
88
+ - exp/asr_stats_raw_en_char/train/text_shape.char
89
+ valid_shape_file:
90
+ - exp/asr_stats_raw_en_char/valid/speech_shape
91
+ - exp/asr_stats_raw_en_char/valid/text_shape.char
92
+ batch_type: numel
93
+ valid_batch_type: null
94
+ fold_length:
95
+ - 80000
96
+ - 150
97
+ sort_in_batch: descending
98
+ shuffle_within_batch: false
99
+ sort_batch: descending
100
+ multiple_iterator: false
101
+ validate_each_iter_factory: true
102
+ chunk_length: 500
103
+ chunk_shift_ratio: 0.5
104
+ num_cache_chunks: 1024
105
+ chunk_excluded_key_prefixes: []
106
+ chunk_default_fs: null
107
+ chunk_max_abs_length: null
108
+ chunk_discard_short_samples: true
109
+ train_data_path_and_name_and_type:
110
+ - - dump/raw/train/wav.scp
111
+ - speech
112
+ - sound
113
+ - - dump/raw/train/text
114
+ - text
115
+ - text
116
+ valid_data_path_and_name_and_type:
117
+ - - dump/raw/dev/wav.scp
118
+ - speech
119
+ - sound
120
+ - - dump/raw/dev/text
121
+ - text
122
+ - text
123
+ multi_task_dataset: false
124
+ allow_variable_data_keys: false
125
+ max_cache_size: 0.0
126
+ max_cache_fd: 32
127
+ allow_multi_rates: false
128
+ valid_max_cache_size: null
129
+ exclude_weight_decay: false
130
+ exclude_weight_decay_conf: {}
131
+ optim: adam
132
+ optim_conf:
133
+ lr: 0.03
134
+ weight_decay: 1.0e-06
135
+ scheduler: warmuplr
136
+ scheduler_conf:
137
+ warmup_steps: 15000
138
+ token_list:
139
+ - <blank>
140
+ - <unk>
141
+ - <space>
142
+ - E
143
+ - T
144
+ - A
145
+ - O
146
+ - I
147
+ - N
148
+ - H
149
+ - S
150
+ - R
151
+ - L
152
+ - D
153
+ - U
154
+ - W
155
+ - M
156
+ - C
157
+ - G
158
+ - Y
159
+ - B
160
+ - P
161
+ - F
162
+ - K
163
+ - ''''
164
+ - V
165
+ - X
166
+ - J
167
+ - Z
168
+ - Q
169
+ - ','
170
+ - '-'
171
+ - <sos/eos>
172
+ init: null
173
+ input_size: null
174
+ ctc_conf:
175
+ dropout_rate: 0.0
176
+ ctc_type: builtin
177
+ reduce: true
178
+ ignore_nan_grad: null
179
+ zero_infinity: true
180
+ brctc_risk_strategy: exp
181
+ brctc_group_strategy: end
182
+ brctc_risk_factor: 0.0
183
+ joint_net_conf: null
184
+ use_preprocessor: true
185
+ use_lang_prompt: false
186
+ use_nlp_prompt: false
187
+ token_type: char
188
+ bpemodel: null
189
+ non_linguistic_symbols: null
190
+ cleaner: null
191
+ g2p: null
192
+ speech_volume_normalize: null
193
+ rir_scp: null
194
+ rir_apply_prob: 1.0
195
+ noise_scp: null
196
+ noise_apply_prob: 1.0
197
+ noise_db_range: '13_15'
198
+ short_noise_thres: 0.5
199
+ aux_ctc_tasks: []
200
+ frontend: s3prl
201
+ frontend_conf:
202
+ frontend_conf:
203
+ upstream: wavlm_large
204
+ download_dir: ./hub
205
+ multilayer_feature: true
206
+ fs: 16k
207
+ specaug: specaug
208
+ specaug_conf:
209
+ apply_time_warp: true
210
+ time_warp_window: 5
211
+ time_warp_mode: bicubic
212
+ apply_freq_mask: true
213
+ freq_mask_width_range:
214
+ - 0
215
+ - 27
216
+ num_freq_mask: 2
217
+ apply_time_mask: true
218
+ time_mask_width_ratio_range:
219
+ - 0.0
220
+ - 0.05
221
+ num_time_mask: 5
222
+ normalize: utterance_mvn
223
+ normalize_conf: {}
224
+ model: espnet
225
+ model_conf:
226
+ ctc_weight: 0.3
227
+ lsm_weight: 0.1
228
+ length_normalized_loss: false
229
+ extract_feats_in_collect_stats: false
230
+ preencoder: linear
231
+ preencoder_conf:
232
+ input_size: 1024
233
+ output_size: 80
234
+ encoder: transformer
235
+ encoder_conf:
236
+ output_size: 256
237
+ attention_heads: 4
238
+ linear_units: 1024
239
+ num_blocks: 12
240
+ dropout_rate: 0.1
241
+ positional_dropout_rate: 0.1
242
+ attention_dropout_rate: 0.1
243
+ input_layer: conv2d2
244
+ normalize_before: true
245
+ postencoder: null
246
+ postencoder_conf: {}
247
+ decoder: transformer
248
+ decoder_conf:
249
+ attention_heads: 4
250
+ linear_units: 2048
251
+ num_blocks: 6
252
+ dropout_rate: 0.1
253
+ positional_dropout_rate: 0.1
254
+ self_attention_dropout_rate: 0.1
255
+ src_attention_dropout_rate: 0.1
256
+ preprocessor: default
257
+ preprocessor_conf: {}
258
+ masker: null
259
+ masker_conf: {}
260
+ required:
261
+ - output_dir
262
+ - token_list
263
+ version: '202412'
264
+ distributed: false
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/acc.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/backward_time.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/cer.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/cer_ctc.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/clip.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/forward_time.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/gpu_max_alloc_mem_GB.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/grad_norm.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/iter_time.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/loss.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/loss_att.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/loss_ctc.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/loss_scale.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/optim0_lr0.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/optim_step_time.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/train_time.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/images/wer.png ADDED
exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/valid.cer.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6663816b215a750848a70baff25f1aae1d14370876f328a5693946a4e14c06b1
3
+ size 1350471000
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202412'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/valid.cer.ave_10best.pth
4
+ python: 3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]
5
+ timestamp: 1756005750.923739
6
+ torch: 2.4.0
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_wavlm_transformer_lr03_raw_en_char_dur05_filter/config.yaml