fjmgAI/whisper-large-v3-ATC

Fine-Tuned Model

fjmgAI/whisper-large-v3-ATC

Base Model

unsloth/whisper-large-v3

Fine-Tuning Method

Fine-tuning was performed using unsloth, an efficient fine-tuning framework optimized for low-resource environments.

Dataset

jacktol/atc-dataset

Description

This dataset contains 14,830 examples transcriptions and corresponding audio files from two main sources: ATCO2 and the UWB-ATCC corpus, specifically selected for aviation-related communications.

Fine-Tuning Details

The model was trained using the Seq2SeqTrainer.
The Word Error Rate (WER) was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.

Usage

Direct Usage (Unsloth)

First install the dependencies:

Colab Version

%%capture

!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
!pip install transformers==4.51.3
!pip install --no-deps unsloth
!pip install librosa soundfile evaluate jiwer

No Colab Version

pip install unsloth
pip install librosa soundfile evaluate jiwer

Then you can load this model and run inference.

import torch
from unsloth import FastModel
from transformers import pipeline
from transformers import WhisperForConditionalGeneration


model, tokenizer = FastModel.from_pretrained(
    model_name = "fjmgAI/whisper-large-v3-ATC",
    dtype = None, 
    load_in_4bit = False, 
    auto_model = WhisperForConditionalGeneration,
    whisper_language = "English",
    whisper_task = "transcribe",
)

model.generation_config.language = "<|en|>"
model.generation_config.task = "transcribe"
model.config.suppress_tokens = []
model.generation_config.forced_decoder_ids = None

whisper = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=tokenizer.tokenizer,
    feature_extractor=tokenizer.feature_extractor,
    processor=tokenizer,
    return_language=True,
    torch_dtype=torch.float16  
)

audio_file = "audio_example.flac"

transcribed_text = whisper(audio_file)

print(transcribed_text["text"])

Purpose

This fine-tuned model is designed for Speech-to-Text (STT) applications in Air Traffic Control (ATC) environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.

Developed by: fjmgAI
License: apache-2.0

fjmgAI
/

whisper-large-v3-ATC

Fine-Tuned Model

Base Model

Fine-Tuning Method

Dataset

Description

Fine-Tuning Details

Usage

Direct Usage (Unsloth)

Purpose

Model tree for fjmgAI/whisper-large-v3-ATC

Dataset used to train fjmgAI/whisper-large-v3-ATC

Collection including fjmgAI/whisper-large-v3-ATC

ASR Models