-
RedHatAI/DeepSeek-R1-Distill-Llama-8B-FP8-dynamic
Text Generation • Updated • 1.73k • 2 -
RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic
Text Generation • Updated • 8.17k • 8 -
RedHatAI/DeepSeek-R1-Distill-Qwen-32B-FP8-dynamic
Text Generation • Updated • 3.96k • 6 -
RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8
Text Generation • Updated • 3.34k • 2
Neural Magic
company
Verified
AI & ML interests
LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV
Recent Activity
View all activity
2:4 sparse versions of Llama-3.1, including transfer learning
-
RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic
Text Generation • Updated • 26 • 1 -
RedHatAI/Sparse-Llama-3.1-8B-gsm8k-2of4
Text Generation • Updated • 21 • 1 -
RedHatAI/Sparse-Llama-3.1-8B-2of4
Text Generation • Updated • 294 • 62 -
RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4
Text Generation • Updated • 63 • 1
Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!
-
RedHatAI/Meta-Llama-3.1-405B-Instruct-FP8
Text Generation • Updated • 3.12k • 31 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8
Text Generation • Updated • 131k • 43 -
RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8
Text Generation • Updated • 225k • 48 -
RedHatAI/Phi-3-medium-128k-instruct-FP8
Text Generation • Updated • 32 • 5
Neural Magic quantized Llama-3.1 models
-
RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8
Text Generation • Updated • 225k • 48 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8
Text Generation • Updated • 131k • 43 -
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16
Text Generation • Updated • 8.09k • 32 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16
Text Generation • Updated • 26.8k • 29
Accurate INT4 quantized models by Neural Magic, ready for use with vLLM!
-
RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w4a16
Text Generation • Updated • 1.65k • 12 -
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16
Text Generation • Updated • 8.09k • 32 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16
Text Generation • Updated • 26.8k • 29 -
RedHatAI/Mistral-Nemo-Instruct-2407-quantized.w4a16
Text Generation • Updated • 146 • 4
Papers that we're proud to integrate into our libraries
-
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper • 2310.06927 • Published • 14 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Paper • 2203.07259 • Published • 4 -
How Well Do Sparse Imagenet Models Transfer?
Paper • 2111.13445 • Published • 1
Explore our breakthrough in sparse fine-tuning LLMs! Our novel method maintains downstream accuracy even with >70% sparsity.
-
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper • 2310.06927 • Published • 14 -
16
Sparse Llama Gsm8k
📚Solve math problems with chat-based guidance
-
RedHatAI/mpt-7b-gsm8k-pruned40-quant-ds
Text Generation • Updated • 20 -
RedHatAI/mpt-7b-gsm8k-pruned50-quant-ds
Text Generation • Updated • 56
-
RedHatAI/granite-3.1-2b-instruct-quantized.w4a16
Text Generation • Updated • 147 -
RedHatAI/granite-3.1-2b-instruct-quantized.w8a8
Text Generation • Updated • 1.59k -
RedHatAI/granite-3.1-8b-instruct-quantized.w4a16
Text Generation • Updated • 1.7k • 1 -
RedHatAI/granite-3.1-8b-instruct-quantized.w8a8
Text Generation • Updated • 1.9k • 1
Vision Language Models (VLMs) quantized by Neural Magic
-
RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic
Text Generation • Updated • 4.37k • 24 -
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic
Text Generation • Updated • 4.52k • 10 -
RedHatAI/pixtral-12b-FP8-dynamic
Text Generation • Updated • 84.3k • 10 -
RedHatAI/Phi-3-vision-128k-instruct-W4A16-G128
Text Generation • Updated • 57 • 1
Llama 3.2 models quantized by Neural Magic
-
RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic
Text Generation • Updated • 4.37k • 24 -
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic
Text Generation • Updated • 4.52k • 10 -
RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic
Text Generation • Updated • 20.6k • 3 -
RedHatAI/Llama-3.2-3B-Instruct-FP8-dynamic
Text Generation • Updated • 2.59k • 3
Accurate INT8 quantized models by Neural Magic, ready for use with vLLM!
-
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8
Text Generation • Updated • 8.47k • 20 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8
Text Generation • Updated • 14.5k • 17 -
RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w8a8
Text Generation • Updated • 30 • 2 -
RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a8
Text Generation • Updated • 36 • 2
Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras
-
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Paper • 2405.03594 • Published • 7 -
RedHatAI/Llama-2-7b-pruned50-retrained
Text Generation • Updated • 35 -
RedHatAI/Llama-2-7b-pruned70-retrained
Text Generation • Updated • 1.87k -
RedHatAI/Llama-2-7b-ultrachat200k-pruned_50
Text Generation • Updated • 25
Useful LLMs for DeepSparse where we've pruned at least 50% of the weights!
-
RedHatAI/OpenHermes-2.5-Mistral-7B-pruned50-quant-ds
Text Generation • Updated • 32 • 2 -
RedHatAI/Nous-Hermes-2-SOLAR-10.7B-pruned50-quant-ds
Text Generation • Updated • 24 • 7 -
RedHatAI/SOLAR-10.7B-Instruct-v1.0-pruned50-quant-ds
Text Generation • Updated • 75 • 5 -
RedHatAI/Llama2-7b-chat-pruned50-quant-ds
Text Generation • Updated • 28 • 9
LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI!
-
RedHatAI/DeepSeek-R1-Distill-Llama-8B-FP8-dynamic
Text Generation • Updated • 1.73k • 2 -
RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic
Text Generation • Updated • 8.17k • 8 -
RedHatAI/DeepSeek-R1-Distill-Qwen-32B-FP8-dynamic
Text Generation • Updated • 3.96k • 6 -
RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8
Text Generation • Updated • 3.34k • 2
-
RedHatAI/granite-3.1-2b-instruct-quantized.w4a16
Text Generation • Updated • 147 -
RedHatAI/granite-3.1-2b-instruct-quantized.w8a8
Text Generation • Updated • 1.59k -
RedHatAI/granite-3.1-8b-instruct-quantized.w4a16
Text Generation • Updated • 1.7k • 1 -
RedHatAI/granite-3.1-8b-instruct-quantized.w8a8
Text Generation • Updated • 1.9k • 1
2:4 sparse versions of Llama-3.1, including transfer learning
-
RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic
Text Generation • Updated • 26 • 1 -
RedHatAI/Sparse-Llama-3.1-8B-gsm8k-2of4
Text Generation • Updated • 21 • 1 -
RedHatAI/Sparse-Llama-3.1-8B-2of4
Text Generation • Updated • 294 • 62 -
RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4
Text Generation • Updated • 63 • 1
Vision Language Models (VLMs) quantized by Neural Magic
-
RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic
Text Generation • Updated • 4.37k • 24 -
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic
Text Generation • Updated • 4.52k • 10 -
RedHatAI/pixtral-12b-FP8-dynamic
Text Generation • Updated • 84.3k • 10 -
RedHatAI/Phi-3-vision-128k-instruct-W4A16-G128
Text Generation • Updated • 57 • 1
Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!
-
RedHatAI/Meta-Llama-3.1-405B-Instruct-FP8
Text Generation • Updated • 3.12k • 31 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8
Text Generation • Updated • 131k • 43 -
RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8
Text Generation • Updated • 225k • 48 -
RedHatAI/Phi-3-medium-128k-instruct-FP8
Text Generation • Updated • 32 • 5
Llama 3.2 models quantized by Neural Magic
-
RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic
Text Generation • Updated • 4.37k • 24 -
RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic
Text Generation • Updated • 4.52k • 10 -
RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic
Text Generation • Updated • 20.6k • 3 -
RedHatAI/Llama-3.2-3B-Instruct-FP8-dynamic
Text Generation • Updated • 2.59k • 3
Neural Magic quantized Llama-3.1 models
-
RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8
Text Generation • Updated • 225k • 48 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8
Text Generation • Updated • 131k • 43 -
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16
Text Generation • Updated • 8.09k • 32 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16
Text Generation • Updated • 26.8k • 29
Accurate INT8 quantized models by Neural Magic, ready for use with vLLM!
-
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8
Text Generation • Updated • 8.47k • 20 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8
Text Generation • Updated • 14.5k • 17 -
RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w8a8
Text Generation • Updated • 30 • 2 -
RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a8
Text Generation • Updated • 36 • 2
Accurate INT4 quantized models by Neural Magic, ready for use with vLLM!
-
RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w4a16
Text Generation • Updated • 1.65k • 12 -
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16
Text Generation • Updated • 8.09k • 32 -
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16
Text Generation • Updated • 26.8k • 29 -
RedHatAI/Mistral-Nemo-Instruct-2407-quantized.w4a16
Text Generation • Updated • 146 • 4
Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras
-
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Paper • 2405.03594 • Published • 7 -
RedHatAI/Llama-2-7b-pruned50-retrained
Text Generation • Updated • 35 -
RedHatAI/Llama-2-7b-pruned70-retrained
Text Generation • Updated • 1.87k -
RedHatAI/Llama-2-7b-ultrachat200k-pruned_50
Text Generation • Updated • 25
Papers that we're proud to integrate into our libraries
-
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper • 2310.06927 • Published • 14 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Paper • 2203.07259 • Published • 4 -
How Well Do Sparse Imagenet Models Transfer?
Paper • 2111.13445 • Published • 1
Useful LLMs for DeepSparse where we've pruned at least 50% of the weights!
-
RedHatAI/OpenHermes-2.5-Mistral-7B-pruned50-quant-ds
Text Generation • Updated • 32 • 2 -
RedHatAI/Nous-Hermes-2-SOLAR-10.7B-pruned50-quant-ds
Text Generation • Updated • 24 • 7 -
RedHatAI/SOLAR-10.7B-Instruct-v1.0-pruned50-quant-ds
Text Generation • Updated • 75 • 5 -
RedHatAI/Llama2-7b-chat-pruned50-quant-ds
Text Generation • Updated • 28 • 9
Explore our breakthrough in sparse fine-tuning LLMs! Our novel method maintains downstream accuracy even with >70% sparsity.
-
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper • 2310.06927 • Published • 14 -
16
Sparse Llama Gsm8k
📚Solve math problems with chat-based guidance
-
RedHatAI/mpt-7b-gsm8k-pruned40-quant-ds
Text Generation • Updated • 20 -
RedHatAI/mpt-7b-gsm8k-pruned50-quant-ds
Text Generation • Updated • 56
LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI!