vllm

  • For 24GB VRAM
    • max-model-len: <4096 (marlin_awq) - not available
    • max-model-len: 10240 (2048 + 8192) (awq)
vllm serve werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ --max-model-len 10240 --quantization awq --dtype half --port 8000 --gpu-memory-utilization 0.99 --enforce_eager
Downloads last month
13
Safetensors
Model size
5.73B params
Tensor type
I32
BF16
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for werty1248/Qwen2.5-32B-s1.1-Ko-Native-AWQ

Base model

Qwen/Qwen2.5-32B
Quantized
(1)
this model