metadata
base_model: google/gemma-2-9b-it
library_name: transformers
license: gemma
pipeline_tag: text-generation
tags:
- conversational
- llama-cpp
- matrixportal
extra_gated_heading: Access Gemma on Hugging Face
extra_gated_prompt: >-
To access Gemma on Hugging Face, you’re required to review and agree to
Google’s usage license. To do this, please ensure you’re logged in to Hugging
Face and click below. Requests are processed immediately.
extra_gated_button_content: Acknowledge license
matrixportal/gemma-2-9b-it-GGUF
This model was converted to GGUF format from google/gemma-2-9b-it
using llama.cpp via the ggml.ai's all-gguf-same-where space.
Refer to the original model card for more details on the model.
✅ Quantized Models Download List
🔍 Recommended Quantizations
- ✨ General CPU Use:
Q4_K_M
(Best balance of speed/quality) - 📱 ARM Devices:
Q4_0
(Optimized for ARM CPUs) - 🏆 Maximum Quality:
Q8_0
(Near-original quality)
📦 Full Quantization Options
🚀 Download | 🔢 Type | 📝 Notes |
---|---|---|
Download | Basic quantization | |
Download | Small size | |
Download | Balanced quality | |
Download | Better quality | |
Download | Fast on ARM | |
Download | Fast, recommended | |
Download | Best balance | |
Download | Good quality | |
Download | Balanced | |
Download | High quality | |
Download | Very good quality | |
Download | Fast, best quality | |
Download | Maximum accuracy |
💡 Tip: Use F16
for maximum precision when quality is critical