any4: Learned 4-bit Numeric Representation for LLMs
Abstract
any4 is a learned 4-bit weight quantization method for LLMs that achieves high accuracy without preprocessing and uses a GPU-efficient lookup table strategy.
We present any4, a learned 4-bit weight quantization solution for large language models (LLMs) providing arbitrary numeric representations without requiring pre-processing of weights or activations. any4 yields higher accuracy compared to other related 4-bit numeric representation types: int4, fp4 and nf4, as evaluated on a range of model sizes, generations and families (Llama 2, Llama 3, Mistral and Mixtral). While any4 does not require preprocessing of weights or activations, it is also competitive with orthogonal techniques that require such preprocessing (e.g., AWQ and GPTQ). We also experiment with any3 and any2 and show competitiveness at lower bits. Additionally, we show that we can calibrate using a single curated diverse sample rather than hundreds of samples from a dataset as done in most quantization approaches. We also open source tinygemm, a latency optimized GPU matrix multiplication library for LLMs, that implements any4 using a GPU-efficient lookup table strategy along with other common quantization methods. We open source our code at https://github.com/facebookresearch/any4 .
Community
We introduce any4, a new 4-bit weight quantization method for large language models that replaces fixed codebooks like int4, fp4, or nf4 with a learned lookup table (LUT) for each row of the weight matrix. This per-row flexibility allows any4 to map 4-bit codes to arbitrary floating-point values, significantly improving quantization accuracy across models such as Llama 2/3, Mistral, and Mixtral. any4 is calibration-efficient, requiring minimal data and no outlier handling, while still matching or outperforming more complex approaches like GPTQ and AWQ.
We also release tinygemm, a GPU-optimized library for low-latency inference not just for any4, but int4, int8, and nf4:
https://github.com/facebookresearch/any4
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper