Update README.md
Browse files
README.md
CHANGED
@@ -21,4 +21,8 @@ Here are some benchmark results:
|
|
21 |
| [QAT Q4_0 (google)](https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/blob/main/gemma-3-27b-it-q4_0.gguf) | 17.2 GB | 8.2323 +/- 0.06320 | 82.850% [81.6505%, 83.9865%] |
|
22 |
|
23 |
Note that this model ends up smaller than the Q4_0 from Bartowski. This is because llama.cpp sets some tensors to Q4_1 when quantizing models to Q4_0 with imatrix, but this is a static quant.
|
24 |
-
The perplexity score for this one is even lower with this model compared to the original model by Google, but the results are within margin of error, so it's probably just luck.
|
|
|
|
|
|
|
|
|
|
21 |
| [QAT Q4_0 (google)](https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/blob/main/gemma-3-27b-it-q4_0.gguf) | 17.2 GB | 8.2323 +/- 0.06320 | 82.850% [81.6505%, 83.9865%] |
|
22 |
|
23 |
Note that this model ends up smaller than the Q4_0 from Bartowski. This is because llama.cpp sets some tensors to Q4_1 when quantizing models to Q4_0 with imatrix, but this is a static quant.
|
24 |
+
The perplexity score for this one is even lower with this model compared to the original model by Google, but the results are within margin of error, so it's probably just luck.
|
25 |
+
|
26 |
+
I also fixed the control token metadata, which was slightly degrading the performance of the model in instruct mode. Shoutout to ngxson for [finding the issue](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-gguf/discussions/3#67f6a2e0207b4bceea793151),
|
27 |
+
tdh111 for [making me aware of the issue](https://huggingface.co/stduhpf/google-gemma-3-27b-it-qat-q4_0-gguf-small/discussions/3#67f74fdf8411d4d6a82049db),
|
28 |
+
and u/dampflokfreund on reddit ([Dampfinchen](https://huggingface.co/Dampfinchen) on Huggingface) for [sharing the steps to fix it](https://www.reddit.com/r/LocalLLaMA/comments/1jvi860/comment/mmcuvw2).
|