Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ metrics:
|
|
5 |
base_model:
|
6 |
- google/gemma-3-1b-it-qat-q4_0-gguf
|
7 |
---
|
8 |
-
This is a requantized version of https://huggingface.co/google/gemma-3-1b-it-qat-q4_0-
|
9 |
|
10 |
The official QAT weights released by google use fp16 (instead of Q6_K) for the embeddings table, which makes this model take a significant extra amount of memory (and storage) compared to what Q4_0 quants are supposed to take.
|
11 |
~~Instead of quantizing the table myself, I extracted it from Bartowski's quantized models, because those were already calibrated with imatrix, which should squeeze some extra performance out of it.~~
|
|
|
5 |
base_model:
|
6 |
- google/gemma-3-1b-it-qat-q4_0-gguf
|
7 |
---
|
8 |
+
This is a requantized version of https://huggingface.co/google/gemma-3-1b-it-qat-q4_0-gguf.
|
9 |
|
10 |
The official QAT weights released by google use fp16 (instead of Q6_K) for the embeddings table, which makes this model take a significant extra amount of memory (and storage) compared to what Q4_0 quants are supposed to take.
|
11 |
~~Instead of quantizing the table myself, I extracted it from Bartowski's quantized models, because those were already calibrated with imatrix, which should squeeze some extra performance out of it.~~
|