--- library_name: transformers datasets: - cobrayyxx/FLEURS_INDO-ENG_Speech_Translation language: - id - en base_model: - openai/whisper-small pipeline_tag: audio-text-to-text --- # Model Card for Model ID ## Model Details ### Model Description This model is a fine-tuned version of [opeanai/whisper-small](https://huggingface.co/openai/whisper-small) on [Fleurs Dataset](https://huggingface.co/datasets/cobrayyxx/FLEURS_INDO-ENG_Speech_Translation). ## Uses This model is used to predict the transcription of indonesian audio. ## How to Get Started with the Model Use the code below to get started with the model. 1. Convert to ct2 first ```!ct2-transformers-converter --model cobrayyxx/whisper-small-indo-transcription --output_dir cobrayyxx/whisper-small-indo-transcription-ct2 --copy_files tokenizer.json preprocessor_config.json --quantization float16``` 2. Load the ct2 model ``` from faster_whisper import WhisperModel model_transcribe = WhisperModel(model_transcribe, device="cpu", compute_type="float32") ``` ## Training Details ## Model Details ### Model Overview - Framework: Hugging Face Transformers - Training Steps: 100 steps - Epochs: Approximately 0.56 - Training Loss: 0.3916 - Model Purpose: [Specify your task here, e.g., text classification, summarization, etc.] - Performance Metrics - Train Runtime: 458.31 seconds - Train Samples per Second: 3.491 - Train Steps per Second: 0.218 - Total Floating Point Operations (FLOPs): 4.62 × 10^17 ## Next Steps - Doing evaluation for this model ## Citation ``` @misc{radford2022whisper, doi = {10.48550/ARXIV.2212.04356}, url = {https://arxiv.org/abs/2212.04356}, author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya}, title = {Robust Speech Recognition via Large-Scale Weak Supervision}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} } ```