Doctor-Shotgun
/

airoboros-2.2.1-limarpv3-y34b-exl2

Text Generation

Model card Files Files and versions Community

Doctor-Shotgun commited on Nov 11, 2023

Commit

a368527

•

1 Parent(s): 12a9f17

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -20,6 +20,7 @@ Exllama v2 quant of [Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b](https://huggi
 Branches:
 - main: measurement.json calculated at 2048 token calibration rows on PIPPA
-- more quants coming
 - 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
   - ideal for large (>24gb) VRAM setups

 Branches:
 - main: measurement.json calculated at 2048 token calibration rows on PIPPA
+- 4.65bpw-h6: 4.65 decoder bits per weight, 6 head bits
+  - ideal for 24gb GPUs at 8k context (on my 24gb Windows setup with flash attention 2, peak VRAM usage during inference with exllamav2_hf was around 23.4gb with 0.9gb used at baseline)
 - 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
   - ideal for large (>24gb) VRAM setups