DSR1-8B-llmc-awq-w4

AWQ quantized version of DeepSeek-R1-Distill-Llama-8B using llm-compressor.

Model Details

  • Base Model: DeepSeek-R1-Distill-Llama-8B
  • Quantization: AWQ W4A16 (4-bit weights, 16-bit activations)
  • Group Size: 128
  • Framework: llm-compressor
  • Memory: ~5.3GB (vs 15GB original)

Usage

vLLM

from vllm import LLM

model = LLM("benyamini/DSR1-8B-llmc-awq-w4")
output = model.generate("Hello, how are you?")
print(output[0].outputs[0].text)

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("benyamini/DSR1-8B-llmc-awq-w4")
tokenizer = AutoTokenizer.from_pretrained("benyamini/DSR1-8B-llmc-awq-w4")

Performance

  • Perplexity: 16.16 (vs 15.02 baseline) on WikiText
  • Memory: 64% reduction (5.3GB vs 15GB)
  • Quality: ~7% perplexity increase

License

Same as base model (DeepSeek License)

Downloads last month
2
Safetensors
Model size
1.98B params
Tensor type
BF16
·
I64
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for edge-inference/DSR1-8B-llmc-awq-w4

Quantized
(181)
this model