Fine-Tuned Model
fjmgAI/whisper-large-v3-ATC
Base Model
unsloth/whisper-large-v3
Fine-Tuning Method
Fine-tuning was performed using unsloth
, an efficient fine-tuning framework optimized for low-resource environments.
Dataset
Description
This dataset contains 14,830 examples transcriptions and corresponding audio files from two main sources: ATCO2 and the UWB-ATCC corpus, specifically selected for aviation-related communications.
Fine-Tuning Details
- The model was trained using the Seq2SeqTrainer.
- The Word Error Rate (WER) was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.
Usage
Direct Usage (Unsloth)
First install the dependencies:
Colab Version
%%capture
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
!pip install transformers==4.51.3
!pip install --no-deps unsloth
!pip install librosa soundfile evaluate jiwer
No Colab Version
pip install unsloth
pip install librosa soundfile evaluate jiwer
Then you can load this model and run inference.
import torch
from unsloth import FastModel
from transformers import pipeline
from transformers import WhisperForConditionalGeneration
model, tokenizer = FastModel.from_pretrained(
model_name = "fjmgAI/whisper-large-v3-ATC",
dtype = None,
load_in_4bit = False,
auto_model = WhisperForConditionalGeneration,
whisper_language = "English",
whisper_task = "transcribe",
)
model.generation_config.language = "<|en|>"
model.generation_config.task = "transcribe"
model.config.suppress_tokens = []
model.generation_config.forced_decoder_ids = None
whisper = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=tokenizer.tokenizer,
feature_extractor=tokenizer.feature_extractor,
processor=tokenizer,
return_language=True,
torch_dtype=torch.float16
)
audio_file = "audio_example.flac"
transcribed_text = whisper(audio_file)
print(transcribed_text["text"])
Purpose
This fine-tuned model is designed for Speech-to-Text (STT) applications in Air Traffic Control (ATC) environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.
- Developed by: fjmgAI
- License: apache-2.0