How to run in vLLM

by boshko - opened May 23

Discussion

boshko

May 23

Can you please update the instructions on how to run this quantized model in vLLM

Thanks

todiadiyatmo

May 23

try using sglang, it has vllm backend

      python3 -m sglang.launch_server
      --served-model-name tonjoo-coder
      --model-path unsloth/Devstral-Small-2505-bnb-4bit
      --chat-template /models/devstral.jinja
      --port 8000
      --host 0.0.0.0
      --mem-fraction-static 0.8

actualy i try tp=2 but not working, using tp=1 might work

danielhanchen

Unsloth AI org May 24

You should be able to do it fine vllm serve unsloth/Devstral-Small-2505-bnb-4bit --quantization bitsandbytes --load-format bitsandbytes see https://docs.vllm.ai/en/latest/features/quantization/bnb.html

boshko

Jun 6

You should be able to do it fine vllm serve unsloth/Devstral-Small-2505-bnb-4bit --quantization bitsandbytes --load-format bitsandbytes see https://docs.vllm.ai/en/latest/features/quantization/bnb.html

That worked, thanks!

boshko changed discussion status to closed Jun 6

cahya

about 1 month ago

•

edited about 1 month ago

try using sglang, it has vllm backend

      python3 -m sglang.launch_server
      --served-model-name tonjoo-coder
      --model-path unsloth/Devstral-Small-2505-bnb-4bit
      --chat-template /models/devstral.jinja
      --port 8000
      --host 0.0.0.0
      --mem-fraction-static 0.8

actualy i try tp=2 but not working, using tp=1 might work

Hi @todiadiyatmo , what is your chat-template devstral.jinja?

todiadiyatmo

29 days ago

it is on unsloth guide : https://docs.unsloth.ai/basics/tutorials-how-to-fine-tune-and-run-llms/devstral-how-to-run-and-fine-tune#tutorial-how-to-run-devstral-in-ollama. the guide has been update though, if not mistaken the file is this :
https://huggingface.co/unsloth/Devstral-Small-2505-GGUF/blob/main/template

cahya

29 days ago

Terimakasih mas @todiadiyatmo

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment