|
--- |
|
language: |
|
- ml |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
- vegam |
|
license: mit |
|
datasets: |
|
- google/fleurs |
|
- thennal/IMaSC |
|
- mozilla-foundation/common_voice_11_0 |
|
library_name: ctranslate2 |
|
--- |
|
|
|
> Note: Model file size is 3.06 GB |
|
|
|
# vegam-whipser-medium-ml (വേഗം) |
|
|
|
This is a conversion of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. |
|
|
|
This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper). |
|
|
|
## Installation |
|
|
|
- Install [faster-whisper](https://github.com/guillaumekln/faster-whisper). More details about installation can be [found here in faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master#installation). |
|
|
|
``` |
|
pip install faster-whisper |
|
``` |
|
|
|
- Install [git-lfs](https://git-lfs.com/) for using this project. [Other approaches for downloading git-lfs in non-debian based systems](https://github.com/git-lfs/git-lfs?utm_source=gitlfs_site&utm_medium=installation_link&utm_campaign=gitlfs#installing). |
|
|
|
Note that git-lfs is just for downloading model from hugging-face. |
|
|
|
``` |
|
apt-get install git-lfs |
|
``` |
|
|
|
- Download the model weights |
|
|
|
``` |
|
git lfs install |
|
git clone https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml |
|
``` |
|
|
|
## Usage |
|
|
|
``` |
|
from faster_whisper import WhisperModel |
|
|
|
model_path = "vegam-whisper-medium-ml" |
|
|
|
# Run on GPU with FP16 |
|
model = WhisperModel(model_path, device="cuda", compute_type="float16") |
|
|
|
# or run on GPU with INT8 |
|
# model = WhisperModel(model_path, device="cuda", compute_type="int8_float16") |
|
# or run on CPU with INT8 |
|
# model = WhisperModel(model_path, device="cpu", compute_type="int8") |
|
|
|
segments, info = model.transcribe("audio.mp3", beam_size=5) |
|
|
|
print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) |
|
|
|
for segment in segments: |
|
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) |
|
``` |
|
|
|
## Example |
|
|
|
``` |
|
from faster_whisper import WhisperModel |
|
|
|
model_path = "vegam-whisper-medium-ml" |
|
|
|
model = WhisperModel(model_path, device="cuda", compute_type="float16") |
|
|
|
|
|
segments, info = model.transcribe("00b38e80-80b8-4f70-babf-566e848879fc.webm", beam_size=5) |
|
|
|
print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) |
|
|
|
for segment in segments: |
|
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) |
|
``` |
|
|
|
> Detected language 'ta' with probability 0.353516 |
|
|
|
> [0.00s -> 4.74s] പാലം കടുക്കുവോളം നാരായണ പാലം കടന്നാലൊ കൂരായണ |
|
|
|
Note: The audio file [00b38e80-80b8-4f70-babf-566e848879fc.webm](https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml/blob/main/00b38e80-80b8-4f70-babf-566e848879fc.webm) is from [Malayalam Speech Corpus](https://blog.smc.org.in/malayalam-speech-corpus/) and is stored along with model weights. |
|
## Conversion Details |
|
|
|
This conversion was possible with wonderful [CTranslate2 library](https://github.com/OpenNMT/CTranslate2) leveraging the [Transformers converter for OpenAI Whisper](https://opennmt.net/CTranslate2/guides/transformers.html#whisper).The original model was converted with the following command: |
|
|
|
``` |
|
ct2-transformers-converter --model thennal/whisper-medium-ml --output_dir vegam-whisper-medium-ml |
|
``` |
|
|
|
## Many Thanks to |
|
|
|
- Creators of CTranslate2 and faster-whisper |
|
- Thennal D K |
|
- Santhosh Thottingal |
|
|
|
|