We have quantised the model in 2-bit to make it inferenceable in low-end GPU cards at scale. It was achieved thanks to llama.cpp library.

GGUF

Model size

27B params

Architecture

gemma3

Hardware compatibility

2-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sleeping-ai/Gemma3-27B-IT-TQ2-0

Base model

Finetuned

Quantized

(94)

this model