Update README.md
Browse files
README.md
CHANGED
@@ -113,8 +113,6 @@ Compared to the base model, this version:
|
|
113 |
- Does **not** include punctuation or uppercase letters.
|
114 |
- Was trained on **9,500+ hours** of diverse, manually transcribed French speech.
|
115 |
|
116 |
-
The training code is available in the [nemo asr training repository](https://github.com/linagora-labs/nemo_asr_training).
|
117 |
-
|
118 |
---
|
119 |
|
120 |
## Performance
|
@@ -174,7 +172,43 @@ asr_model.change_decoding_strategy(decoder_type="ctc")
|
|
174 |
asr_model.transcribe([audio_path])
|
175 |
```
|
176 |
|
177 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
|
179 |
The data were transformed, processed and converted using [NeMo tools from the SSAK repository](https://github.com/linagora-labs/ssak/tree/main/tools/nemo)
|
180 |
|
|
|
113 |
- Does **not** include punctuation or uppercase letters.
|
114 |
- Was trained on **9,500+ hours** of diverse, manually transcribed French speech.
|
115 |
|
|
|
|
|
116 |
---
|
117 |
|
118 |
## Performance
|
|
|
172 |
asr_model.transcribe([audio_path])
|
173 |
```
|
174 |
|
175 |
+
## Training Details
|
176 |
+
|
177 |
+
The training code is available in the [nemo_asr_training repository](https://github.com/linagora-labs/nemo_asr_training).
|
178 |
+
The full configuration used for fine-tuning is available [here](https://github.com/linagora-labs/nemo_asr_training/blob/main/fastconformer/yamls/nvidia_stt_fr_fastconformer_hybrid_large_pc.yaml).
|
179 |
+
|
180 |
+
### Hardware
|
181 |
+
- 1× NVIDIA H100 GPU (80 GB)
|
182 |
+
|
183 |
+
### Training Configuration
|
184 |
+
- Precision: BF16 mixed precision
|
185 |
+
- Max training steps: 100,000
|
186 |
+
- Gradient accumulation: 4 batches
|
187 |
+
|
188 |
+
### Tokenizer
|
189 |
+
- Type: SentencePiece
|
190 |
+
- Vocabulary size: 1,024 tokens
|
191 |
+
|
192 |
+
### Optimization
|
193 |
+
- Optimizer: `AdamW`
|
194 |
+
- Learning rate: `1e-5`
|
195 |
+
- Betas: `[0.9, 0.98]`
|
196 |
+
- Weight decay: `1e-3`
|
197 |
+
- Scheduler: `CosineAnnealing`
|
198 |
+
- Warmup steps: 10,000
|
199 |
+
- Minimum learning rate: `1e-6`
|
200 |
+
|
201 |
+
### Data Setup
|
202 |
+
- 6 duration buckets (ranging from 0.1s to 30s)
|
203 |
+
- Batch sizes per bucket:
|
204 |
+
- Bucket 1 (shortest segments): batch size 80
|
205 |
+
- Bucket 2: batch size 76
|
206 |
+
- Bucket 3: batch size 72
|
207 |
+
- Bucket 4: batch size 68
|
208 |
+
- Bucket 5: batch size 64
|
209 |
+
- Bucket 6 (longest segments): batch size 60
|
210 |
+
|
211 |
+
### Training datasets
|
212 |
|
213 |
The data were transformed, processed and converted using [NeMo tools from the SSAK repository](https://github.com/linagora-labs/ssak/tree/main/tools/nemo)
|
214 |
|