linagora
/

linto_stt_fr_fastconformer

Automatic Speech Recognition

Model card Files Files and versions Community

AudranB commited on 8 days ago

Commit

f84a9d3

·

verified ·

1 Parent(s): 8469512

Update README.md

Files changed (1) hide show

README.md +37 -3

README.md CHANGED Viewed

@@ -113,8 +113,6 @@ Compared to the base model, this version:
 - Does **not** include punctuation or uppercase letters.
 - Was trained on **9,500+ hours** of diverse, manually transcribed French speech.
-The training code is available in the [nemo asr training repository](https://github.com/linagora-labs/nemo_asr_training).
 ---
 ## Performance
@@ -174,7 +172,43 @@ asr_model.change_decoding_strategy(decoder_type="ctc")
 asr_model.transcribe([audio_path])
 ```
-## Datasets
 The data were transformed, processed and converted using [NeMo tools from the SSAK repository](https://github.com/linagora-labs/ssak/tree/main/tools/nemo)

 - Does **not** include punctuation or uppercase letters.
 - Was trained on **9,500+ hours** of diverse, manually transcribed French speech.
 ---
 ## Performance
 asr_model.transcribe([audio_path])
 ```
+## Training Details
+The training code is available in the [nemo_asr_training repository](https://github.com/linagora-labs/nemo_asr_training).
+The full configuration used for fine-tuning is available [here](https://github.com/linagora-labs/nemo_asr_training/blob/main/fastconformer/yamls/nvidia_stt_fr_fastconformer_hybrid_large_pc.yaml).
+### Hardware
+- 1× NVIDIA H100 GPU (80 GB)
+### Training Configuration
+- Precision: BF16 mixed precision
+- Max training steps: 100,000
+- Gradient accumulation: 4 batches
+### Tokenizer
+- Type: SentencePiece
+- Vocabulary size: 1,024 tokens
+### Optimization
+- Optimizer: `AdamW`
+  - Learning rate: `1e-5`
+  - Betas: `[0.9, 0.98]`
+  - Weight decay: `1e-3`
+- Scheduler: `CosineAnnealing`
+  - Warmup steps: 10,000
+  - Minimum learning rate: `1e-6`
+### Data Setup
+- 6 duration buckets (ranging from 0.1s to 30s)
+- Batch sizes per bucket:
+  - Bucket 1 (shortest segments): batch size 80
+  - Bucket 2: batch size 76
+  - Bucket 3: batch size 72
+  - Bucket 4: batch size 68
+  - Bucket 5: batch size 64
+  - Bucket 6 (longest segments): batch size 60
+### Training datasets
 The data were transformed, processed and converted using [NeMo tools from the SSAK repository](https://github.com/linagora-labs/ssak/tree/main/tools/nemo)