turboderp/gemma-3-27b-it-exl3 · Exl3 might be SOTA?

I think it might be SOTA, yes. It's not that surprising since it's based on QTIP, and QTIP is brilliant.

Now, there is a github repo by the original authors, and a couple of QTIP models on HF, but I can't actually get any of that to work. I gave up after some days of trying, but I was always going to roll my own implementation anyway, so whatever. I think, though, that it's safe to assume the few QTIP models on HF would be at least as good as their EXL3 counterparts, since the underlying method is roughly the same. The main idea behind EXL3 was to make it all approachable and easy, because working with the reference QTIP code is very hard (respectfully.)

There's some more information here and some early benchmarks. I'll update those soon, I think, since I mostly want to focus on KL divergence going forward. Using perplexity to compare quants this way is somewhat flawed.