Update README.md
Browse files
README.md
CHANGED
@@ -49,7 +49,7 @@ TODO:
|
|
49 |
|
50 |
------------
|
51 |
|
52 |
-
## T-MAC
|
53 |
|
54 |
| Model | Size | Params | Backend | Threads | Test | t/s (tokens/sec) |
|
55 |
|-------------------------|---------|--------|---------|---------|--------|----------------------|
|
@@ -58,4 +58,5 @@ TODO:
|
|
58 |
| qwen2 ?B INT_N Q4_K | 1.70 GiB| 3.40 B | CPU | 4 | pp512 | 59.66 ± 0.10 |
|
59 |
| qwen2 ?B INT_N Q4_K | 1.70 GiB| 3.40 B | CPU | 4 | tg128 | 26.43 ± 0.14 |
|
60 |
|
61 |
-
|
|
|
|
49 |
|
50 |
------------
|
51 |
|
52 |
+
## llama.cpp Q4_K_M scheme and T-MAC inference -groupsize 128?
|
53 |
|
54 |
| Model | Size | Params | Backend | Threads | Test | t/s (tokens/sec) |
|
55 |
|-------------------------|---------|--------|---------|---------|--------|----------------------|
|
|
|
58 |
| qwen2 ?B INT_N Q4_K | 1.70 GiB| 3.40 B | CPU | 4 | pp512 | 59.66 ± 0.10 |
|
59 |
| qwen2 ?B INT_N Q4_K | 1.70 GiB| 3.40 B | CPU | 4 | tg128 | 26.43 ± 0.14 |
|
60 |
|
61 |
+
**It's 16.3% faster and 13% smaller.**
|
62 |
+
- [Issue Link](https://github.com/microsoft/T-MAC/issues/79)
|