how i use this version model in vllm serve
CUDA_VISIBLE_DEVICES=0 python3 -m vllm.entrypoints.openai.api_server --model mistral-small-24b-bnb-4bit --max_model_len=20000 --port 8080 --quantization bitsandbytes --load-format bitsandbytes --tokenizer_mode mistral --config_format mistral --tool-call-parser mistral --enable-auto-tool-choice
this command is not ok
I have the same doubt. I would like to know if this is possible or if we have to wait for an update of vLLM.
My bad, wrong answer...
CUDA_VISIBLE_DEVICES=0 python3 -m vllm.entrypoints.openai.api_server --model mistral-small-24b-bnb-4bit --max_model_len=20000 --port 8080 --quantization bitsandbytes --load-format bitsandbytes --tokenizer_mode mistral --config_format mistral --tool-call-parser mistral --enable-auto-tool-choice
this command is not ok
I have the same doubt. I would like to know if this is possible or if we have to wait for an update of vLLM.
My bad, wrong answer...
In the meantime you guys can use the standard BnB one: https://huggingface.co/unsloth/Mistral-Small-3.1-24B-Instruct-2503-bnb-4bit
I have the same issue, I don't know the arguments to serve this model... Anyone? Thanks!
@shimmyshimmer Hi! What is the difference between these two? The naming convention is not clear.
unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit
vs.unsloth/Mistral-Small-3.1-24B-Instruct-2503-bnb-4bit
@shimmyshimmer Hi! What is the difference between these two? The naming convention is not clear.
unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit
vs.unsloth/Mistral-Small-3.1-24B-Instruct-2503-bnb-4bit
@docgerbil read this blog post https://unsloth.ai/blog/dynamic-4bit