Compression Papers - a neuralmagic Collection

neuralmagic 's Collections

DeepSeek-R1-Distill Quantized

Granite 3.1 Quantization

Sparse-Llama-3.1-2of4

Vision Language Models Quantization

FP8 LLMs for vLLM

Llama-3.2 Quantization

Llama-3.1 Quantization

INT8 LLMs for vLLM

INT4 LLMs for vLLM

Sparse Foundational Llama 2 Models

Compression Papers

DeepSparse Sparse LLMs

Sparse Finetuning MPT

Compressed LLMs from the Community

Compression Papers

updated Sep 26, 2024

Papers that we're proud to integrate into our libraries

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 14
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Paper • 2301.00774 • Published Jan 2, 2023 • 3
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Paper • 2203.07259 • Published Mar 14, 2022 • 4
How Well Do Sparse Imagenet Models Transfer?

Paper • 2111.13445 • Published Nov 26, 2021 • 1
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6, 2024 • 7