Neural Magic

company

Verified

https://neuralmagic.com/

neuralmagic

Activity Feed

AI & ML interests

LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV

neuralmagic 's collections 14

DeepSeek-R1-Distill Quantized

RedHatAI/DeepSeek-R1-Distill-Llama-8B-FP8-dynamic

Text Generation • 8B • Updated Feb 27 • 273 • 4
RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic

Text Generation • 71B • Updated Feb 27 • 514 • 9
RedHatAI/DeepSeek-R1-Distill-Qwen-32B-FP8-dynamic

Text Generation • 33B • Updated Feb 27 • 986 • 8
RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8

Text Generation • 71B • Updated Feb 27 • 3.9k • 2

Sparse-Llama-3.1-2of4

2:4 sparse versions of Llama-3.1, including transfer learning

RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic

Text Generation • 8B • Updated Dec 19, 2024 • 3 • 1
RedHatAI/Sparse-Llama-3.1-8B-gsm8k-2of4

Text Generation • 8B • Updated Nov 21, 2024 • 2 • 1
RedHatAI/Sparse-Llama-3.1-8B-2of4

Text Generation • 8B • Updated Dec 16, 2024 • 65 • 62
RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4

Text Generation • 8B • Updated Nov 21, 2024 • 4 • 1

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!

RedHatAI/Meta-Llama-3.1-405B-Instruct-FP8

Text Generation • 406B • Updated Oct 9, 2024 • 614 • 31
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8

Text Generation • 8B • Updated Oct 9, 2024 • 168k • 42
RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8

Text Generation • 71B • Updated Mar 25 • 5.77k • 50
RedHatAI/Phi-3-medium-128k-instruct-FP8

Text Generation • 14B • Updated Oct 9, 2024 • 5 • 5

Llama-3.1 Quantization

Neural Magic quantized Llama-3.1 models

RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8

Text Generation • 71B • Updated Mar 25 • 5.77k • 50
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8

Text Generation • 8B • Updated Oct 9, 2024 • 168k • 42
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Text Generation • 11B • Updated Feb 12 • 413k • 32
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16

Text Generation • 2B • Updated Sep 22 • 35k • 30

INT4 LLMs for vLLM

Accurate INT4 quantized models by Neural Magic, ready for use with vLLM!

RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w4a16

Text Generation • 58B • Updated Oct 10, 2024 • 86 • 12
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Text Generation • 11B • Updated Feb 12 • 413k • 32
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16

Text Generation • 2B • Updated Sep 22 • 35k • 30
RedHatAI/Mistral-Nemo-Instruct-2407-quantized.w4a16

Text Generation • 3B • Updated Oct 9, 2024 • 404 • 4

Compression Papers

Papers that we're proud to integrate into our libraries

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 15
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Paper • 2301.00774 • Published Jan 2, 2023 • 3
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Paper • 2203.07259 • Published Mar 14, 2022 • 4
How Well Do Sparse Imagenet Models Transfer?

Paper • 2111.13445 • Published Nov 26, 2021 • 1

Sparse Finetuning MPT

Explore our breakthrough in sparse fine-tuning LLMs! Our novel method maintains downstream accuracy even with >70% sparsity.

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 15
Paused

16

16

Sparse Llama Gsm8k

📚

Solve math problems with chat-based guidance
RedHatAI/mpt-7b-gsm8k-pruned40-quant-ds

Text Generation • Updated Oct 12, 2023 • 1
RedHatAI/mpt-7b-gsm8k-pruned50-quant-ds

Text Generation • Updated Oct 12, 2023 • 10

Granite 3.1 Quantization

RedHatAI/granite-3.1-2b-instruct-quantized.w4a16

Text Generation • 0.5B • Updated Feb 28 • 133
RedHatAI/granite-3.1-2b-instruct-quantized.w8a8

Text Generation • 3B • Updated Feb 28 • 22
RedHatAI/granite-3.1-8b-instruct-quantized.w4a16

Text Generation • 1B • Updated Sep 22 • 618 • 1
RedHatAI/granite-3.1-8b-instruct-quantized.w8a8

Text Generation • 8B • Updated Sep 25 • 133 • 2

Vision Language Models Quantization

Vision Language Models (VLMs) quantized by Neural Magic

RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic

Text Generation • 11B • Updated Oct 2, 2024 • 11.6k • 24
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic

Text Generation • 89B • Updated Oct 2, 2024 • 120k • 10
RedHatAI/pixtral-12b-FP8-dynamic

Text Generation • 13B • Updated Feb 7 • 87 • 10
RedHatAI/Phi-3-vision-128k-instruct-W4A16-G128

Text Generation • 1B • Updated Feb 10 • 29 • 1

Llama-3.2 Quantization

Llama 3.2 models quantized by Neural Magic

RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic

Text Generation • 11B • Updated Oct 2, 2024 • 11.6k • 24
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic

Text Generation • 89B • Updated Oct 2, 2024 • 120k • 10
RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic

Text Generation • 1B • Updated Oct 9, 2024 • 117k • 3
RedHatAI/Llama-3.2-3B-Instruct-FP8-dynamic

Text Generation • 4B • Updated Oct 9, 2024 • 212 • 3

INT8 LLMs for vLLM

Accurate INT8 quantized models by Neural Magic, ready for use with vLLM!

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8

Text Generation • 71B • Updated Feb 11 • 7.05k • 21
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8

Text Generation • 8B • Updated Sep 22 • 27.5k • 17
RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w8a8

Text Generation • 406B • Updated Dec 3, 2024 • 12 • 2
RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a8

Text Generation • 14B • Updated Oct 9, 2024 • 4 • 2

Sparse Foundational Llama 2 Models

Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6, 2024 • 7
RedHatAI/Llama-2-7b-pruned50-retrained

Text Generation • 7B • Updated May 7, 2024 • 1
RedHatAI/Llama-2-7b-pruned70-retrained

Text Generation • 7B • Updated May 7, 2024 • 331
RedHatAI/Llama-2-7b-ultrachat200k-pruned_50

Text Generation • 7B • Updated May 15, 2024 • 6

DeepSparse Sparse LLMs

Useful LLMs for DeepSparse where we've pruned at least 50% of the weights!

RedHatAI/OpenHermes-2.5-Mistral-7B-pruned50-quant-ds

Text Generation • Updated Dec 6, 2023 • 1 • 2
RedHatAI/Nous-Hermes-2-SOLAR-10.7B-pruned50-quant-ds

Text Generation • Updated Jan 10, 2024 • 1 • 7
RedHatAI/SOLAR-10.7B-Instruct-v1.0-pruned50-quant-ds

Text Generation • Updated Dec 20, 2023 • 3 • 5
RedHatAI/Llama2-7b-chat-pruned50-quant-ds

Text Generation • Updated Jan 10, 2024 • 1 • 9

Compressed LLMs from the Community

LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI!

akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic

Text Generation • 8B • Updated Aug 21, 2024 • 4 • 14
NousResearch/Hermes-3-Llama-3.1-405B-FP8

406B • Updated Sep 9, 2024 • 141 • 28
NousResearch/Hermes-3-Llama-3.1-70B-FP8

71B • Updated Sep 8, 2024 • 487 • 25

DeepSeek-R1-Distill Quantized

RedHatAI/DeepSeek-R1-Distill-Llama-8B-FP8-dynamic

Text Generation • 8B • Updated Feb 27 • 273 • 4
RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic

Text Generation • 71B • Updated Feb 27 • 514 • 9
RedHatAI/DeepSeek-R1-Distill-Qwen-32B-FP8-dynamic

Text Generation • 33B • Updated Feb 27 • 986 • 8
RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8

Text Generation • 71B • Updated Feb 27 • 3.9k • 2

Granite 3.1 Quantization

RedHatAI/granite-3.1-2b-instruct-quantized.w4a16

Text Generation • 0.5B • Updated Feb 28 • 133
RedHatAI/granite-3.1-2b-instruct-quantized.w8a8

Text Generation • 3B • Updated Feb 28 • 22
RedHatAI/granite-3.1-8b-instruct-quantized.w4a16

Text Generation • 1B • Updated Sep 22 • 618 • 1
RedHatAI/granite-3.1-8b-instruct-quantized.w8a8

Text Generation • 8B • Updated Sep 25 • 133 • 2

Sparse-Llama-3.1-2of4

2:4 sparse versions of Llama-3.1, including transfer learning

RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic

Text Generation • 8B • Updated Dec 19, 2024 • 3 • 1
RedHatAI/Sparse-Llama-3.1-8B-gsm8k-2of4

Text Generation • 8B • Updated Nov 21, 2024 • 2 • 1
RedHatAI/Sparse-Llama-3.1-8B-2of4

Text Generation • 8B • Updated Dec 16, 2024 • 65 • 62
RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4

Text Generation • 8B • Updated Nov 21, 2024 • 4 • 1

Vision Language Models Quantization

Vision Language Models (VLMs) quantized by Neural Magic

RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic

Text Generation • 11B • Updated Oct 2, 2024 • 11.6k • 24
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic

Text Generation • 89B • Updated Oct 2, 2024 • 120k • 10
RedHatAI/pixtral-12b-FP8-dynamic

Text Generation • 13B • Updated Feb 7 • 87 • 10
RedHatAI/Phi-3-vision-128k-instruct-W4A16-G128

Text Generation • 1B • Updated Feb 10 • 29 • 1

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!

RedHatAI/Meta-Llama-3.1-405B-Instruct-FP8

Text Generation • 406B • Updated Oct 9, 2024 • 614 • 31
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8

Text Generation • 8B • Updated Oct 9, 2024 • 168k • 42
RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8

Text Generation • 71B • Updated Mar 25 • 5.77k • 50
RedHatAI/Phi-3-medium-128k-instruct-FP8

Text Generation • 14B • Updated Oct 9, 2024 • 5 • 5

Llama-3.2 Quantization

Llama 3.2 models quantized by Neural Magic

RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic

Text Generation • 11B • Updated Oct 2, 2024 • 11.6k • 24
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic

Text Generation • 89B • Updated Oct 2, 2024 • 120k • 10
RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic

Text Generation • 1B • Updated Oct 9, 2024 • 117k • 3
RedHatAI/Llama-3.2-3B-Instruct-FP8-dynamic

Text Generation • 4B • Updated Oct 9, 2024 • 212 • 3

Llama-3.1 Quantization

Neural Magic quantized Llama-3.1 models

RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8

Text Generation • 71B • Updated Mar 25 • 5.77k • 50
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8

Text Generation • 8B • Updated Oct 9, 2024 • 168k • 42
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Text Generation • 11B • Updated Feb 12 • 413k • 32
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16

Text Generation • 2B • Updated Sep 22 • 35k • 30

INT8 LLMs for vLLM

Accurate INT8 quantized models by Neural Magic, ready for use with vLLM!

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8

Text Generation • 71B • Updated Feb 11 • 7.05k • 21
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8

Text Generation • 8B • Updated Sep 22 • 27.5k • 17
RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w8a8

Text Generation • 406B • Updated Dec 3, 2024 • 12 • 2
RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a8

Text Generation • 14B • Updated Oct 9, 2024 • 4 • 2

INT4 LLMs for vLLM

Accurate INT4 quantized models by Neural Magic, ready for use with vLLM!

RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w4a16

Text Generation • 58B • Updated Oct 10, 2024 • 86 • 12
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Text Generation • 11B • Updated Feb 12 • 413k • 32
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16

Text Generation • 2B • Updated Sep 22 • 35k • 30
RedHatAI/Mistral-Nemo-Instruct-2407-quantized.w4a16

Text Generation • 3B • Updated Oct 9, 2024 • 404 • 4

Sparse Foundational Llama 2 Models

Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6, 2024 • 7
RedHatAI/Llama-2-7b-pruned50-retrained

Text Generation • 7B • Updated May 7, 2024 • 1
RedHatAI/Llama-2-7b-pruned70-retrained

Text Generation • 7B • Updated May 7, 2024 • 331
RedHatAI/Llama-2-7b-ultrachat200k-pruned_50

Text Generation • 7B • Updated May 15, 2024 • 6

Compression Papers

Papers that we're proud to integrate into our libraries

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 15
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Paper • 2301.00774 • Published Jan 2, 2023 • 3
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Paper • 2203.07259 • Published Mar 14, 2022 • 4
How Well Do Sparse Imagenet Models Transfer?

Paper • 2111.13445 • Published Nov 26, 2021 • 1

DeepSparse Sparse LLMs

Useful LLMs for DeepSparse where we've pruned at least 50% of the weights!

RedHatAI/OpenHermes-2.5-Mistral-7B-pruned50-quant-ds

Text Generation • Updated Dec 6, 2023 • 1 • 2
RedHatAI/Nous-Hermes-2-SOLAR-10.7B-pruned50-quant-ds

Text Generation • Updated Jan 10, 2024 • 1 • 7
RedHatAI/SOLAR-10.7B-Instruct-v1.0-pruned50-quant-ds

Text Generation • Updated Dec 20, 2023 • 3 • 5
RedHatAI/Llama2-7b-chat-pruned50-quant-ds

Text Generation • Updated Jan 10, 2024 • 1 • 9

Sparse Finetuning MPT

Explore our breakthrough in sparse fine-tuning LLMs! Our novel method maintains downstream accuracy even with >70% sparsity.

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 15
Paused

16

16

Sparse Llama Gsm8k

📚

Solve math problems with chat-based guidance
RedHatAI/mpt-7b-gsm8k-pruned40-quant-ds

Text Generation • Updated Oct 12, 2023 • 1
RedHatAI/mpt-7b-gsm8k-pruned50-quant-ds

Text Generation • Updated Oct 12, 2023 • 10

Compressed LLMs from the Community

LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI!

akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic

Text Generation • 8B • Updated Aug 21, 2024 • 4 • 14
NousResearch/Hermes-3-Llama-3.1-405B-FP8

406B • Updated Sep 9, 2024 • 141 • 28
NousResearch/Hermes-3-Llama-3.1-70B-FP8

71B • Updated Sep 8, 2024 • 487 • 25

AI & ML interests

Team members 38

neuralmagic 's collections 14

Sparse Llama Gsm8k

Sparse Llama Gsm8k