nithinraok commited on
Commit
c4b828d
·
verified ·
1 Parent(s): 30c5e6f

Update README.md

Browse files

Add Granary arxiv link

Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -300,7 +300,7 @@ Training was conducted using this [example script](https://github.com/NVIDIA/NeM
300
  The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
301
 
302
  ### <span style="color:#466f00;">Training Dataset</span>
303
- The model was trained on the Granary dataset, consisting of approximately 120,000 hours of English speech data:
304
 
305
  - 10,000 hours from human-transcribed NeMo ASR Set 3.0, including:
306
  - LibriSpeech (960 hours)
@@ -318,7 +318,7 @@ The model was trained on the Granary dataset, consisting of approximately 120,00
318
  - YODAS dataset [5]
319
  - Librilight [7]
320
 
321
- All transcriptions preserve punctuation and capitalization. The Granary dataset will be made publicly available after presentation at Interspeech 2025.
322
 
323
  **Data Collection Method by dataset**
324
 
@@ -398,6 +398,8 @@ These WER scores were obtained using greedy decoding without an external languag
398
 
399
  [7] [MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages](https://arxiv.org/abs/2410.01036)
400
 
 
 
401
  ## <span style="color:#466f00;">Inference:</span>
402
 
403
  **Engine**:
 
300
  The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
301
 
302
  ### <span style="color:#466f00;">Training Dataset</span>
303
+ The model was trained on the Granary dataset[8], consisting of approximately 120,000 hours of English speech data:
304
 
305
  - 10,000 hours from human-transcribed NeMo ASR Set 3.0, including:
306
  - LibriSpeech (960 hours)
 
318
  - YODAS dataset [5]
319
  - Librilight [7]
320
 
321
+ All transcriptions preserve punctuation and capitalization. The Granary dataset[8] will be made publicly available after presentation at Interspeech 2025.
322
 
323
  **Data Collection Method by dataset**
324
 
 
398
 
399
  [7] [MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages](https://arxiv.org/abs/2410.01036)
400
 
401
+ [8] [Granary: Speech Recognition and Translation Dataset in 25 European Languages](https://arxiv.org/pdf/2505.13404)
402
+
403
  ## <span style="color:#466f00;">Inference:</span>
404
 
405
  **Engine**: