ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Paper • 2409.09506 • Published Sep 14, 2024 • 4
Towards Robust Speech Representation Learning for Thousands of Languages Paper • 2407.00837 • Published Jun 30, 2024 • 11
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification Paper • 2402.12654 • Published Feb 20, 2024 • 1
E-Branchformer: Branchformer with Enhanced merging for speech recognition Paper • 2210.00077 • Published Sep 30, 2022 • 2
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models Paper • 2305.17651 • Published May 28, 2023 • 1
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning Paper • 2309.15317 • Published Sep 26, 2023
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data Paper • 2309.13876 • Published Sep 25, 2023 • 1
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition Paper • 2303.07624 • Published Mar 14, 2023 • 1
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks Paper • 2309.07937 • Published Sep 14, 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives Paper • 2302.12829 • Published Feb 24, 2023
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding Paper • 2207.02971 • Published Jul 6, 2022
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer Paper • 2401.16658 • Published Jan 30, 2024 • 14