ESPnet2 ASR model

imprt/kushinada-hubert-large-lavorotv2-asr

This model is an ESPnet2 ASR model using imprt/kushinada-hubert-large trained on LaboroTVSpeech2 and CSJ using espnet.

Demo: How to use with ESPnet

cd espnet
pip install -e .
cd egs2/laborotv/asr1
# copy all files
# store imprt/kushinada-hubert-large s3prl/kushinada-hubert-large-s3prl.pt to exp/ directory.
# run tedx-jp-10k data preparation 
#
./run_v2.sh --skip_data_prep true --skip-train true

RESULTS

Environments

date: Fri Mar 7 11:10:00 JST 2025
python version: 3.10.14 (main, Jul 10 2024, 13:18:49) [GCC 13.2.0]
espnet version: espnet 202402
pytorch version: pytorch 2.3.1+cu121
Git hash: 19787b1793eda2b4007aa5b2c4d03adf6c18abfb
- Commit date: Fri Jun 14 19:27:35 2024 +0900

exp/asr_train_asr_conformer_kushinada_hubert_large_laborotv2_raw_jp_char_sp

CER

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_lm_lm_train_lm_v2_jp_char_valid.loss.ave_asr_model_valid.acc.ave/tedx-jp-10k	10000	190568	91.0	4.4	4.6	1.9	10.9	57.4

Citation

Citing ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}

or arXiv:

@misc{watanabe2018espnet,
  title={ESPnet: End-to-End Speech Processing Toolkit}, 
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  year={2018},
  eprint={1804.00015},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Citing LaboroTVSpeech

@INPROCEEDINGS{9413425,
  author={Ando, Shintaro and Fujihara, Hiromasa},
  booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Construction of a Large-Scale Japanese ASR Corpus on TV Recordings}, 
  year={2021},
  volume={},
  number={},
  pages={6948-6952},
  keywords={Training;TV;Buildings;Speech recognition;Signal processing;Acoustics;Iterative methods;Automatic speech recognition;Corpus},
  doi={10.1109/ICASSP39728.2021.9413425}}

Citing CSJ

@inproceedings{maekawa03_sspr,
  title     = {Corpus of spontaneous Japanese: its design and evaluation},
  author    = {Kikuo Maekawa},
  year      = {2003},
  booktitle = {ISCA/IEEE Workshop on Spontaneous Speech Processing and Recognition},
  pages     = {paper MMO2},
}

License

Creative Commons Attribution Non Commercial 4.0

imprt
/

kushinada-hubert-large-laborotv2-asr

You need to agree to share your contact information to access this model

ESPnet2 ASR model

Demo: How to use with ESPnet

RESULTS

Environments

exp/asr_train_asr_conformer_kushinada_hubert_large_laborotv2_raw_jp_char_sp

CER

Citation

Citing ESPnet

Citing LaboroTVSpeech

Citing CSJ

License

Model tree for imprt/kushinada-hubert-large-laborotv2-asr

Collection including imprt/kushinada-hubert-large-laborotv2-asr

ASR model