AventIQ-AI
/

whisper-Automatic-AI-transcriptionist

Safetensors

whisper

Model card Files Files and versions Community

developerPushkal commited on Apr 3

Commit

104b157

verified ·

1 Parent(s): 0fa0eaa

Create README.md

Browse files

Files changed (1) hide show

README.md +100 -0

README.md ADDED Viewed

	@@ -0,0 +1,100 @@

+# OpenAI Whisper-Base Fine-Tuned Model for AI-transcriptionist
+This repository hosts a fine-tuned version of the OpenAI Whisper-Base model optimized for AI-transcriptionist tasks using the [Mozilla Common Voice 13.0](https://commonvoice.mozilla.org/) dataset. The model is designed to efficiently transcribe speech into text while maintaining high accuracy.
+## Model Details
+- **Model Architecture**: OpenAI Whisper-Base
+- **Task**: AI-transcriptionist
+- **Dataset**: [Mozilla Common Voice 11.0](https://commonvoice.mozilla.org/)
+- **Fine-tuning Framework**: Hugging Face Transformers
+## 🚀 Usage
+### Installation
+```bash
+pip install transformers torch
+```
+### Loading the Model
+```python
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+import torch
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model_name = "AventIQ-AI/whisper-AI-transcriptionist"
+model = WhisperForConditionalGeneration.from_pretrained(model_name).to(device)
+processor = WhisperProcessor.from_pretrained(model_name)
+```
+### Speech-to-Text Inference
+```python
+import torchaudio
+# Load and process audio file
+def load_audio(file_path, target_sampling_rate=16000):
+    # Load audio file
+    waveform, sample_rate = torchaudio.load(file_path)
+    # Convert to mono if stereo
+    if waveform.shape[0] > 1:
+        waveform = waveform.mean(dim=0, keepdim=True)
+    # Resample if needed
+    if sample_rate != target_sampling_rate:
+        waveform = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sampling_rate)(waveform)
+    return waveform.squeeze(0).numpy()
+input_audio_path = "/kaggle/input/test-data-2/Friday 4h04m pm.m4a"  # Change this to your audio file
+audio_array = load_audio(input_audio_path)
+input_features = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features
+input_features = input_features.to(device)
+forced_decoder_ids = processor.get_decoder_prompt_ids(language="en", task="transcribe")
+with torch.no_grad():
+    predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
+# Decode output
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
+print(f"Transcribed Text: {transcription}")
+```
+## 📊 Evaluation Results
+After fine-tuning the Whisper-Base model for speech-to-text, we evaluated the model's performance on the validation set from the Common Voice 11.0 dataset. The following results were obtained:
+| Metric      | Score  | Meaning |
+|------------|--------|------------------------------------------------|
+| **WER**    | 9.2%   | Word Error Rate: Measures transcription accuracy |
+| **CER**    | 5.5%   | Character Error Rate: Measures character-level accuracy |
+## Fine-Tuning Details
+### Dataset
+The Mozilla Common Voice 11.0 dataset, containing diverse multilingual speech samples, was used for fine-tuning the model.
+### Training
+- **Number of epochs**: 6
+- **Batch size**: 16
+- **Evaluation strategy**: epochs
+- **Learning Rate**: 5e-6
+## 📂 Repository Structure
+```bash
+.
+├── model/               # Contains the quantized model files
+├── tokenizer_config/    # Tokenizer configuration and vocabulary files
+├── model.safetensors/   # Quantized Model
+├── README.md            # Model documentation
+```
+## ⚠️ Limitations
+- The model may struggle with highly noisy or overlapping speech.
+- Performance may vary across different accents and dialects.
+## 🤝 Contributing
+Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.