Automatic Speech Recognition
leduckhai commited on
Commit
b60c723
·
verified ·
1 Parent(s): b106bc6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -12
README.md CHANGED
@@ -14,28 +14,58 @@ base_model:
14
  - openai/whisper-small
15
  new_version: leduckhai/MultiMed-ST
16
  pipeline_tag: automatic-speech-recognition
 
17
  ---
18
  # MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
19
 
20
  Please refer to newer version which integrates ASR + MT models: [https://huggingface.co/leduckhai/MultiMed-ST](https://huggingface.co/leduckhai/MultiMed-ST)
21
 
22
 
23
- ## Description:
24
- Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants.
25
- This technology enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics.
26
- In this work, we introduce *MultiMed*, a collection of small-to-large end-to-end ASR models for the medical domain, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese, together with the corresponding real-world ASR dataset.
27
- To our best knowledge, *MultiMed* stands as **the largest and the first multilingual medical ASR dataset**, in terms of total duration, number of speakers, diversity of diseases, recording conditions, speaker roles, unique medical terms, accents, and ICD-10 codes.
28
 
 
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  Please cite this paper: [https://arxiv.org/abs/2409.14074](https://arxiv.org/abs/2409.14074)
31
 
32
- @inproceedings{le2024multimed,
33
- title={MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder},
34
- author={Le-Duc, Khai and Phan, Phuc and Pham, Tan-Hanh and Tat, Bach Phan and Ngo, Minh-Huong and Hy, Truong-Son},
35
- journal={arXiv preprint arXiv:2409.14074},
36
- year={2024}
37
- }
38
- To load labeled data, please refer to our [HuggingFace](https://huggingface.co/datasets/leduckhai/MultiMed), [Paperswithcodes](https://paperswithcode.com/dataset/multimed).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Contact:
41
 
 
14
  - openai/whisper-small
15
  new_version: leduckhai/MultiMed-ST
16
  pipeline_tag: automatic-speech-recognition
17
+ license: mit
18
  ---
19
  # MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
20
 
21
  Please refer to newer version which integrates ASR + MT models: [https://huggingface.co/leduckhai/MultiMed-ST](https://huggingface.co/leduckhai/MultiMed-ST)
22
 
23
 
24
+ **<div align="center">ACL 2025</div>**
 
 
 
 
25
 
26
+ <div align="center"><b>Khai Le-Duc</b>, Phuc Phan, Tan-Hanh Pham, Bach Phan Tat,</div>
27
 
28
+ <div align="center">Minh-Huong Ngo, Chris Ngo, Thanh Nguyen-Tang, Truong-Son Hy</div>
29
+
30
+
31
+ > Please press ⭐ button and/or cite papers if you feel helpful.
32
+
33
+ <p align="center">
34
+ <img src="MultiMed_ACL2025.png" width="700"/>
35
+ </p>
36
+
37
+ * **Abstract:**
38
+ Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology improves patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, the first multilingual medical ASR dataset, along with the first collection of small-to-large end-to-end medical ASR models, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese. To our best knowledge, MultiMed stands as **the world’s largest medical ASR dataset across all major benchmarks**: total duration, number of recording conditions, number of accents, and number of speaking roles. Furthermore, we present the first multilinguality study for medical ASR, which includes reproducible empirical baselines, a monolinguality-multilinguality analysis, Attention Encoder Decoder (AED) vs Hybrid comparative study and a linguistic analysis. We present practical ASR end-to-end training schemes optimized for a fixed number of trainable parameters that are common in industry settings. All code, data, and models are available online: [https://github.com/leduckhai/MultiMed/tree/master/MultiMed](https://github.com/leduckhai/MultiMed/tree/master/MultiMed).
39
+
40
+ * **Citation:**
41
  Please cite this paper: [https://arxiv.org/abs/2409.14074](https://arxiv.org/abs/2409.14074)
42
 
43
+ ``` bibtex
44
+ @article{le2024multimed,
45
+ title={MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder},
46
+ author={Le-Duc, Khai and Phan, Phuc and Pham, Tan-Hanh and Tat, Bach Phan and Ngo, Minh-Huong and Ngo, Chris and Nguyen-Tang, Thanh and Hy, Truong-Son},
47
+ journal={arXiv preprint arXiv:2409.14074},
48
+ year={2024}
49
+ }
50
+ ```
51
+
52
+ This repository contains scripts for medical automatic speech recognition (ASR) for 5 languages: Vietnamese, English, German, French, and Mandarin Chinese.
53
+ The provided scripts cover model preparation, training, inference, and evaluation processes, based on the dataset *MultiMed*.
54
+
55
+ ## Dataset and Pre-trained Models:
56
+
57
+ Dataset: [🤗 HuggingFace dataset](https://huggingface.co/datasets/leduckhai/MultiMed), [Paperswithcodes dataset](https://paperswithcode.com/dataset/multimed)
58
+
59
+ Pre-trained models: [🤗 HuggingFace models](https://huggingface.co/leduckhai/MultiMed)
60
+
61
+ | Model Name | Description | Link |
62
+ |------------------|--------------------------------------------|----------------------------------------------------------------------|
63
+ | `Whisper-Small-Chinese` | Small model fine-tuned on medical Chinese set | [Hugging Face models](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-chinese) |
64
+ | `Whisper-Small-English` | Small model fine-tuned on medical English set | [Hugging Face models](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-english) |
65
+ | `Whisper-Small-French` | Small model fine-tuned on medical French set | [Hugging Face models](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-french) |
66
+ | `Whisper-Small-German` | Small model fine-tuned on medical German set | [Hugging Face models](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-german) |
67
+ | `Whisper-Small-Vietnamese` | Small model fine-tuned on medical Vietnamese set | [Hugging Face models](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-vietnamese) |
68
+ | `Whisper-Small-Multilingual` | Small model fine-tuned on medical Multilingual set (5 languages) | [Hugging Face models](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-multilingual) |
69
 
70
  ## Contact:
71