FP8-Dynamic quant created with llm-compressor, can run on 16 VRAM cards. Update vLLM and Transformers:
pip install vllm>=0.7.2
pip install transformers>=4.49
Then run with:
vllm serve leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic --trust-remote-code
- Downloads last month
- 292
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic
Base model
Qwen/Qwen2.5-VL-7B-Instruct