Thanks!
#1
by
lightenup
- opened
Superficial testing (Python, Javascript codegen/software engineering practices) doesn't show any performance degradation compared to https://chat.z.ai/ It's a great quantization for 96 GB VRAM!
Thanks, great results on blackwell 96gb gpu , getting avg 80-90t/s with 128k context size, finally sonnet at home
Echo-ing this thanks. This model and quant is great. Any chance you might also do the 4.5V model that just released?
Absolutely
we are working on it. Stay tune!