Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ TODO:
|
|
36 |
| Benchmark 2 | 50 | 4.45 |
|
37 |
|
38 |
------------
|
39 |
-
## Intermediate
|
40 |
| Model Architecture | Intermediate Size |
|
41 |
|--------------------------|-------------------|
|
42 |
| Llama2 7B | 11,008 |
|
@@ -64,3 +64,6 @@ TODO:
|
|
64 |
- [Issue Link](https://github.com/microsoft/T-MAC/issues/79)
|
65 |
|
66 |
AutoGPTQ is used, by default it uses groupsize of 128: making it less bpw and smaller than llama.cpp. https://qwen.readthedocs.io/en/latest/quantization/gptq.html
|
|
|
|
|
|
|
|
36 |
| Benchmark 2 | 50 | 4.45 |
|
37 |
|
38 |
------------
|
39 |
+
## Intermediate Sizes
|
40 |
| Model Architecture | Intermediate Size |
|
41 |
|--------------------------|-------------------|
|
42 |
| Llama2 7B | 11,008 |
|
|
|
64 |
- [Issue Link](https://github.com/microsoft/T-MAC/issues/79)
|
65 |
|
66 |
AutoGPTQ is used, by default it uses groupsize of 128: making it less bpw and smaller than llama.cpp. https://qwen.readthedocs.io/en/latest/quantization/gptq.html
|
67 |
+
|
68 |
+
- The Kquant-series isn't optimized for efficiency, it is meant for quality
|
69 |
+
- Q4_0 will use hardware accelerated dot-product instructions, using quantized-on-the-fly intermediate activations and weights.
|