File size: 558 Bytes
4ed323b 1bfbd40 4ed323b |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
---
license: apache-2.0
base_model:
- Qwen/Qwen3-235B-A22B
library_name: mlx
---
This is a mixed MLX-quantization of Qwen 3 235B-A22B based on the recipe for the ik_llama.cpp quant by ubergarm (https://huggingface.co/ubergarm/Qwen3-235B-A22B-GGUF)
In my own experience, this quant performs better than a standard 4-bit MLX-quant with group-size 128.
I can run this quite comfortably on an M3 Max with 128GB RAM with full context (40k tokens) using "sudo sysctl iogpu.wired_limit_mb=121000"
Let me know what your experience is compared to other quants! :-) |