Enabling or disabling reasoning on-demand with remote vLLM server?
#15
by
abhinavkulkarni
- opened
Hi,
I have brought up a remote vLLM server with reasoning enabled as follows:
services:
vllm:
image: vllm/vllm-openai:latest
command: --model Qwen/Qwen3-30B-A3B-FP8 --enable-reasoning --reasoning-parser deepseek_r1
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- ~/.cache/huggingface:/root/.cache/huggingface
ports:
- "8000:8000"
ipc: host
restart: unless-stopped
Now, I can use openai.OpenAI
client or requests
module to get completions, but is it possible to enable or disable reasoning on-demand? The reasoning seems to be always enabled.
Thanks for the excellent models!
abhinavkulkarni
changed discussion title from
Possibility of enabling and disabling reasoning on-demand with remote vLLM server?
to Enabling or disabling reasoning on-demand with remote vLLM server?