Doctor-Shotgun
commited on
Commit
•
a368527
1
Parent(s):
12a9f17
Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,7 @@ Exllama v2 quant of [Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b](https://huggi
|
|
20 |
|
21 |
Branches:
|
22 |
- main: measurement.json calculated at 2048 token calibration rows on PIPPA
|
23 |
-
-
|
|
|
24 |
- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
|
25 |
- ideal for large (>24gb) VRAM setups
|
|
|
20 |
|
21 |
Branches:
|
22 |
- main: measurement.json calculated at 2048 token calibration rows on PIPPA
|
23 |
+
- 4.65bpw-h6: 4.65 decoder bits per weight, 6 head bits
|
24 |
+
- ideal for 24gb GPUs at 8k context (on my 24gb Windows setup with flash attention 2, peak VRAM usage during inference with exllamav2_hf was around 23.4gb with 0.9gb used at baseline)
|
25 |
- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
|
26 |
- ideal for large (>24gb) VRAM setups
|