yifanyu/I-DLM-8B
Text Generation • 8B • Updated • 654 • 7
Any plans on releasing flashhead for qwen3.5 models?
pip install flash-head
vllm serve embedl/Qwen3-1.7B-FlashHead-W4A16vllm.general_plugins entry point. No source patches, no custom imports.vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1
# Baseline comparison
FLASHHEAD_ENABLED=0 vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1