Update README.md

Add Granary arxiv link

Files changed (1) hide show

README.md CHANGED Viewed

@@ -300,7 +300,7 @@ Training was conducted using this [example script](https://github.com/NVIDIA/NeM
 The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 ### <span style="color:#466f00;">Training Dataset</span>
-The model was trained on the Granary dataset, consisting of approximately 120,000 hours of English speech data:
 - 10,000 hours from human-transcribed NeMo ASR Set 3.0, including:
   - LibriSpeech (960 hours)
@@ -318,7 +318,7 @@ The model was trained on the Granary dataset, consisting of approximately 120,00
   - YODAS dataset [5]
   - Librilight [7]
-All transcriptions preserve punctuation and capitalization. The Granary dataset will be made publicly available after presentation at Interspeech 2025.
 **Data Collection Method by dataset**
@@ -398,6 +398,8 @@ These WER scores were obtained using greedy decoding without an external languag
 [7] [MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages](https://arxiv.org/abs/2410.01036)
 ## <span style="color:#466f00;">Inference:</span>
 **Engine**:

 The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 ### <span style="color:#466f00;">Training Dataset</span>
+The model was trained on the Granary dataset[8], consisting of approximately 120,000 hours of English speech data:
 - 10,000 hours from human-transcribed NeMo ASR Set 3.0, including:
   - LibriSpeech (960 hours)
   - YODAS dataset [5]
   - Librilight [7]
+All transcriptions preserve punctuation and capitalization. The Granary dataset[8] will be made publicly available after presentation at Interspeech 2025.
 **Data Collection Method by dataset**
 [7] [MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages](https://arxiv.org/abs/2410.01036)
+[8] [Granary: Speech Recognition and Translation Dataset in 25 European Languages](https://arxiv.org/pdf/2505.13404)
 ## <span style="color:#466f00;">Inference:</span>
 **Engine**: