helcig commited on
Commit
8601592
·
verified ·
1 Parent(s): 4ed2d4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -3,4 +3,33 @@ base_model:
3
  - meta-llama/Llama-3.1-8B-Instruct
4
  ---
5
 
6
- See [GGUF Toolkit repo](https://github.com/IST-DASLab/gguf-toolkit)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - meta-llama/Llama-3.1-8B-Instruct
4
  ---
5
 
6
+ # Llama-3.1-8B-Instruct GGUF DASLab Quantization
7
+
8
+ This repository contains advanced quantized versions of Llama 3.1 8B Instruct using **GPTQ quantization** and **GPTQ+EvoPress optimization** from the [DASLab GGUF Toolkit](https://github.com/IST-DASLab/gguf-toolkit).
9
+
10
+ ## Models
11
+
12
+ - **GPTQ Uniform**: High-quality GPTQ quantization at 2-6 bit precision
13
+ - **GPTQ+EvoPress**: Non-uniform per-layer quantization discovered via evolutionary search
14
+
15
+ ## Performance
16
+
17
+ Our GPTQ-based quantization methods achieve **superior quality-compression tradeoffs** compared to standard quantization:
18
+
19
+ - **Better perplexity** at equivalent bitwidths vs. naive quantization approaches
20
+ - **Error-correcting updates** during calibration for improved accuracy
21
+ - **Optimized configurations** that allocate bits based on layer sensitivity (EvoPress)
22
+
23
+ | Method | Avg Bits | C4 PPL | WikiText2 PPL |
24
+ |--------|----------|--------|---------------|
25
+ | GPTQ-4 | 4.50 | 11.35 | 6.89 |
26
+ | EvoPress-GPTQ-4 | 4.50 | 11.35 | 6.89 |
27
+ | EvoPress-GPTQ-5 | 5.51 | 11.13 | 6.79 |
28
+
29
+ ## Usage
30
+
31
+ Compatible with llama.cpp and all GGUF-supporting inference engines. No special setup required.
32
+
33
+ **Full documentation, evaluation results, and toolkit source**: https://github.com/IST-DASLab/gguf-toolkit
34
+
35
+ ---