You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Please read CC-BY-NC-4.0 before downloading this model.

Log in or Sign Up to review the conditions and access this model content.

ESPnet2 ASR model

imprt/kushinada-hubert-large-lavorotv2-asr

This model is an ESPnet2 ASR model using imprt/kushinada-hubert-large trained on LaboroTVSpeech2 and CSJ using espnet.

Demo: How to use with ESPnet

cd espnet
pip install -e .
cd egs2/laborotv/asr1
# copy all files
# store imprt/kushinada-hubert-large s3prl/kushinada-hubert-large-s3prl.pt to exp/ directory.
# run tedx-jp-10k data preparation 
#
./run_v2.sh --skip_data_prep true --skip-train true

RESULTS

Environments

  • date: Fri Mar 7 11:10:00 JST 2025
  • python version: 3.10.14 (main, Jul 10 2024, 13:18:49) [GCC 13.2.0]
  • espnet version: espnet 202402
  • pytorch version: pytorch 2.3.1+cu121
  • Git hash: 19787b1793eda2b4007aa5b2c4d03adf6c18abfb
    • Commit date: Fri Jun 14 19:27:35 2024 +0900

exp/asr_train_asr_conformer_kushinada_hubert_large_laborotv2_raw_jp_char_sp

CER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decode_asr_lm_lm_train_lm_v2_jp_char_valid.loss.ave_asr_model_valid.acc.ave/tedx-jp-10k 10000 190568 91.0 4.4 4.6 1.9 10.9 57.4

Citation

Citing ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}

or arXiv:

@misc{watanabe2018espnet,
  title={ESPnet: End-to-End Speech Processing Toolkit}, 
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  year={2018},
  eprint={1804.00015},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Citing LaboroTVSpeech

@INPROCEEDINGS{9413425,
  author={Ando, Shintaro and Fujihara, Hiromasa},
  booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Construction of a Large-Scale Japanese ASR Corpus on TV Recordings}, 
  year={2021},
  volume={},
  number={},
  pages={6948-6952},
  keywords={Training;TV;Buildings;Speech recognition;Signal processing;Acoustics;Iterative methods;Automatic speech recognition;Corpus},
  doi={10.1109/ICASSP39728.2021.9413425}}

Citing CSJ

@inproceedings{maekawa03_sspr,
  title     = {Corpus of spontaneous Japanese: its design and evaluation},
  author    = {Kikuo Maekawa},
  year      = {2003},
  booktitle = {ISCA/IEEE Workshop on Spontaneous Speech Processing and Recognition},
  pages     = {paper MMO2},
}

License

Creative Commons Attribution Non Commercial 4.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for imprt/kushinada-hubert-large-laborotv2-asr

Finetuned
(2)
this model

Collection including imprt/kushinada-hubert-large-laborotv2-asr