imi2 commited on
Commit
e671a99
·
verified ·
1 Parent(s): 95260dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -36,7 +36,7 @@ TODO:
36
  | Benchmark 2 | 50 | 4.45 |
37
 
38
  ------------
39
- ## Intermediate Layer Sizes
40
  | Model Architecture | Intermediate Size |
41
  |--------------------------|-------------------|
42
  | Llama2 7B | 11,008 |
@@ -64,3 +64,6 @@ TODO:
64
  - [Issue Link](https://github.com/microsoft/T-MAC/issues/79)
65
 
66
  AutoGPTQ is used, by default it uses groupsize of 128: making it less bpw and smaller than llama.cpp. https://qwen.readthedocs.io/en/latest/quantization/gptq.html
 
 
 
 
36
  | Benchmark 2 | 50 | 4.45 |
37
 
38
  ------------
39
+ ## Intermediate Sizes
40
  | Model Architecture | Intermediate Size |
41
  |--------------------------|-------------------|
42
  | Llama2 7B | 11,008 |
 
64
  - [Issue Link](https://github.com/microsoft/T-MAC/issues/79)
65
 
66
  AutoGPTQ is used, by default it uses groupsize of 128: making it less bpw and smaller than llama.cpp. https://qwen.readthedocs.io/en/latest/quantization/gptq.html
67
+
68
+ - The Kquant-series isn't optimized for efficiency, it is meant for quality
69
+ - Q4_0 will use hardware accelerated dot-product instructions, using quantized-on-the-fly intermediate activations and weights.