LLaMA3-Quantization - a Efficient-ML Collection

Efficient-ML 's Collections

Qwen3-Quantization

LLaMA3-Quantization

LLaMA3-Quantization

updated Apr 23, 2024

This is the official quantized models collection of “How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study”

Efficient-ML/LLaMA-3-8B-GPTQ-4bit-b128

Updated Apr 21, 2024 • 3
Efficient-ML/LLaMA-3-8B-SmoothQuant-4bit-4bit

Text Generation • Updated Apr 22, 2024 • 10
Efficient-ML/LLaMA-3-8B-AWQ-4bit-b128

Text Generation • Updated Apr 28, 2024 • 8
Efficient-ML/LLaMA-3-8B-SmoothQuant-8bit-8bit

Text Generation • Updated Apr 22, 2024 • 12
Efficient-ML/LLaMA-3-8B-QuIP-2bit

Text Generation • Updated Apr 26, 2024 • 9 • 3
Efficient-ML/LLaMA-3-8B-DB-LLM-2bit-fake

Text Generation • Updated Apr 26, 2024 • 12 • 2
Efficient-ML/LLaMA-3-8B-PB-LLM-1.7bit-fake

Text Generation • Updated Apr 22, 2024 • 10 • 1
Efficient-ML/LLaMA-3-8B-BiLLM-1.1bit-fake

Updated Apr 21, 2024
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22, 2024 • 46