Fine-Tuned Model

fjmgAI/whisper-large-v3-ATC

Base Model

unsloth/whisper-large-v3

Fine-Tuning Method

Fine-tuning was performed using unsloth, an efficient fine-tuning framework optimized for low-resource environments.

Dataset

jacktol/atc-dataset

Description

This dataset contains 14,830 examples transcriptions and corresponding audio files from two main sources: ATCO2 and the UWB-ATCC corpus, specifically selected for aviation-related communications.

Fine-Tuning Details

  • The model was trained using the Seq2SeqTrainer.
  • The Word Error Rate (WER) was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.

Usage

Direct Usage (Unsloth)

First install the dependencies:

Colab Version

%%capture

!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
!pip install transformers==4.51.3
!pip install --no-deps unsloth
!pip install librosa soundfile evaluate jiwer

No Colab Version

pip install unsloth
pip install librosa soundfile evaluate jiwer

Then you can load this model and run inference.

import torch
from unsloth import FastModel
from transformers import pipeline
from transformers import WhisperForConditionalGeneration


model, tokenizer = FastModel.from_pretrained(
    model_name = "fjmgAI/whisper-large-v3-ATC",
    dtype = None, 
    load_in_4bit = False, 
    auto_model = WhisperForConditionalGeneration,
    whisper_language = "English",
    whisper_task = "transcribe",
)

model.generation_config.language = "<|en|>"
model.generation_config.task = "transcribe"
model.config.suppress_tokens = []
model.generation_config.forced_decoder_ids = None

whisper = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=tokenizer.tokenizer,
    feature_extractor=tokenizer.feature_extractor,
    processor=tokenizer,
    return_language=True,
    torch_dtype=torch.float16  
)

audio_file = "audio_example.flac"

transcribed_text = whisper(audio_file)

print(transcribed_text["text"])

Purpose

This fine-tuned model is designed for Speech-to-Text (STT) applications in Air Traffic Control (ATC) environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.

  • Developed by: fjmgAI
  • License: apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fjmgAI/whisper-large-v3-ATC

Finetuned
(17)
this model

Dataset used to train fjmgAI/whisper-large-v3-ATC

Collection including fjmgAI/whisper-large-v3-ATC