why is the model so large?

#12
by WangBicheng - opened

The original whisper-large-v3 model size is about 1.5GB.
Why is the finetune model so large?
Does it means we need more VRAM for depolying it?

BELLE-2 Group // Be Everyone's Large Language model Engine org

Thanks for your question!

Actually, the original Whisper-large-v3 model in FP32 precision is around 6GB, not 1.5GB. The size you mentioned (1.5GB) might be referring to a quantized version, such as FP16 or INT8.

When fine-tuning models like Whisper, we usually work with the full-precision version (FP32), which explains the larger file size.

If you're concerned about VRAM usage during deployment or inference:

✅ You can use lower precision versions, such as:

FP16: ~3GB
INT8: ~1.5GB
Libraries like faster-whisper provide support for loading and running these quantized models efficiently.

Sign up or log in to comment