Thanks!

#1
by lightenup - opened

Superficial testing (Python, Javascript codegen/software engineering practices) doesn't show any performance degradation compared to https://chat.z.ai/ It's a great quantization for 96 GB VRAM!

Thanks, great results on blackwell 96gb gpu , getting avg 80-90t/s with 128k context size, finally sonnet at home

Echo-ing this thanks. This model and quant is great. Any chance you might also do the 4.5V model that just released?

QuantTrio org

Absolutely

QuantTrio org

we are working on it. Stay tune!

Sign up or log in to comment