stduhpf commited on
Commit
cc4afca
·
verified ·
1 Parent(s): 8223929

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -12,7 +12,7 @@ The official QAT weights released by google use fp16 (instead of Q6_K) for the e
12
  ~~Instead of quantizing the table myself, I extracted it from Bartowski's quantized models, because those were already calibrated with imatrix, which should squeeze some extra performance out of it.~~
13
  Requantizing with llama.cpp fixes that and gives better result than the other thing.
14
 
15
- Here are some benchmar results:
16
 
17
  | Model | File size ↓ | PPL (wiki.text.raw) ↓ | Hellaswag (first 4000 tasks, deterministic) ↑ |
18
  | --- | --- | --- | --- |
@@ -26,4 +26,8 @@ Here are some benchmar results:
26
  Note that this model ends up smaller than the Q4_0 from Bartowski. This is because llama.cpp sets some tensors to Q4_1 when quantizing models to Q4_0 with imatrix, but this is a static quant.
27
  The perplexity scores are within margin of error between this model and the original QAT, despite the size difference.
28
 
29
- The drop in Hellaswag score with the older version of the model is what made me realize there was probably something missing with my previous approach. It's much better now.
 
 
 
 
 
12
  ~~Instead of quantizing the table myself, I extracted it from Bartowski's quantized models, because those were already calibrated with imatrix, which should squeeze some extra performance out of it.~~
13
  Requantizing with llama.cpp fixes that and gives better result than the other thing.
14
 
15
+ Here are some benchmark results:
16
 
17
  | Model | File size ↓ | PPL (wiki.text.raw) ↓ | Hellaswag (first 4000 tasks, deterministic) ↑ |
18
  | --- | --- | --- | --- |
 
26
  Note that this model ends up smaller than the Q4_0 from Bartowski. This is because llama.cpp sets some tensors to Q4_1 when quantizing models to Q4_0 with imatrix, but this is a static quant.
27
  The perplexity scores are within margin of error between this model and the original QAT, despite the size difference.
28
 
29
+ The drop in Hellaswag score with the older version of the model is what made me realize there was probably something missing with my previous approach. It's much better now.
30
+
31
+ I also fixed the control token metadata, which was slightly degrading the performance of the model in instruct mode. Shoutout to ngxson for [finding the issue](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-gguf/discussions/3#67f6a2e0207b4bceea793151),
32
+ tdh111 for [making me aware of the issue](https://huggingface.co/stduhpf/google-gemma-3-27b-it-qat-q4_0-gguf-small/discussions/3#67f74fdf8411d4d6a82049db),
33
+ and u/dampflokfreund on reddit ([Dampfinchen](https://huggingface.co/Dampfinchen) on Huggingface) for [sharing the steps to fix it](https://www.reddit.com/r/LocalLLaMA/comments/1jvi860/comment/mmcuvw2).