imi2
/

llama-2-7b-chat-pure-Q4_0-gguf

Model card Files Files and versions Community

imi2 commited on Apr 12

Commit

e671a99

·

verified ·

1 Parent(s): 95260dd

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ TODO:
 | Benchmark 2        | 50     | 4.45   |
 ------------
-## Intermediate Layer Sizes
 | Model Architecture       | Intermediate Size |
 |--------------------------|-------------------|
 | Llama2 7B                | 11,008            |
@@ -64,3 +64,6 @@ TODO:
 - [Issue Link](https://github.com/microsoft/T-MAC/issues/79)
 AutoGPTQ is used, by default it uses groupsize of 128: making it less bpw and smaller than llama.cpp. https://qwen.readthedocs.io/en/latest/quantization/gptq.html

 | Benchmark 2        | 50     | 4.45   |
 ------------
+## Intermediate Sizes
 | Model Architecture       | Intermediate Size |
 |--------------------------|-------------------|
 | Llama2 7B                | 11,008            |
 - [Issue Link](https://github.com/microsoft/T-MAC/issues/79)
 AutoGPTQ is used, by default it uses groupsize of 128: making it less bpw and smaller than llama.cpp. https://qwen.readthedocs.io/en/latest/quantization/gptq.html
+- The Kquant-series isn't optimized for efficiency, it is meant for quality
+- Q4_0 will use hardware accelerated dot-product instructions, using quantized-on-the-fly intermediate activations and weights.