DSR1-1.5B-llmc-awq-w4
AWQ quantized version of DeepSeek-R1-Distill-Qwen-1.5B using llm-compressor.
Model Details
- Base Model: DeepSeek-R1-Distill-Qwen-1.5B
- Quantization: AWQ W4A16 (4-bit weights, 16-bit activations)
- Group Size: 128
- Framework: llm-compressor
- Memory: 1.6GB (vs 3GB original)
Usage
vLLM
from vllm import LLM
model = LLM("edge-inference/DSR1-1.5B-llmc-awq-w4")
output = model.generate("Hello, how are you?")
print(output[0].outputs[0].text)
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("edge-inference/DSR1-1.5B-llmc-awq-w4")
tokenizer = AutoTokenizer.from_pretrained("edge-inference/DSR1-1.5B-llmc-awq-w4")
Performance
- Memory: 47% reduction (1.6GB vs 3GB)
- Speed: Faster inference due to reduced memory bandwidth
- Quality: Minimal degradation with AWQ quantization
License
Same as base model (DeepSeek License)
- Downloads last month
- 8
Model tree for edge-inference/DSR1-1.5B-llmc-awq-w4
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B