metadata

library_name: transformers
language:
  - sq
license: mit
base_model: openai/whisper-large-v3-turbo
datasets:
  - Kushtrim/audioshqip-200h
metrics:
  - wer
model-index:
  - name: Whisper Large v3 Turbo Shqip
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Audio Shqip 200 orë
          type: Kushtrim/audioshqip-200h
          args: 'config: sq, split: test'
        metrics:
          - type: wer
            value: 19.891368436098556
            name: Wer

Whisper Large V3 Turbo Shqip

This model is a fine-tuned version of openai/whisper-large-v3-turbo specifically for the Albanian language, including the Gheg dialect. It was trained on a meticulously curated dataset comprising 200 hours of high-quality Albanian audio.

Key Features

Language Coverage: Supports standard Albanian as well as the Gheg dialect, ensuring robust transcription performance across regional variations.
Dataset: Fine-tuned on 200 hours of diverse and well-annotated Albanian audio data, capturing a wide range of accents, speech contexts, and domains.

This model is optimized for automatic speech recognition (ASR) tasks in Albanian and can be used in applications such as transcription, subtitling, and real-time speech processing.