Doctor-Shotgun commited on
Commit
a368527
1 Parent(s): 12a9f17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -20,6 +20,7 @@ Exllama v2 quant of [Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b](https://huggi
20
 
21
  Branches:
22
  - main: measurement.json calculated at 2048 token calibration rows on PIPPA
23
- - more quants coming
 
24
  - 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
25
  - ideal for large (>24gb) VRAM setups
 
20
 
21
  Branches:
22
  - main: measurement.json calculated at 2048 token calibration rows on PIPPA
23
+ - 4.65bpw-h6: 4.65 decoder bits per weight, 6 head bits
24
+ - ideal for 24gb GPUs at 8k context (on my 24gb Windows setup with flash attention 2, peak VRAM usage during inference with exllamav2_hf was around 23.4gb with 0.9gb used at baseline)
25
  - 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
26
  - ideal for large (>24gb) VRAM setups