Pushing the Limits of Large Language Model Quantization via the Linearity Theorem Paper • 2411.17525 • Published Nov 26, 2024 • 6
HIGGS Collection Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run. • 18 items • Updated Feb 18 • 15
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification Paper • 2604.14531 • Published 7 days ago • 6
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation Paper • 2604.09497 • Published 13 days ago • 29
HLWQ Unified (Weights Q5 + KV Cache Q3) Collection Full-stack HLWQ: Q5 weights + torchao INT4 + Q3 KV cache · formerly PolarQuant Unified • 16 items • Updated 5 days ago • 3
Rethink_SFT_generalization Collection Repo for paper Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability. • 40 items • Updated 12 days ago • 17
view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs 16 days ago • 59
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 3 days ago • 125
Open Pangram Collection Open models and datasets based on Pangram's ICLR 2026 EditLens paper licensed for noncommercial use ONLY under CC BY-NC-SA 4.0 • 4 items • Updated 30 days ago • 11
CodeScout Collection RL-trained code search agents (1.7B, 4B, 14B) that outperform 2–18× larger models using only a Unix terminal. 📄 arxiv.org/abs/2603.17829 • 12 items • Updated Mar 19 • 7
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 Mar 10 • 132
Distil Efficiency Benchmarks Collection Collection of models used in the blog post www.distillabs.ai/blog/the-10x-inference-tax-you-dont-have-to-pay • 9 items • Updated Mar 2 • 3