Enabling or disabling reasoning on-demand with remote vLLM server?

#15
by abhinavkulkarni - opened

Hi,

I have brought up a remote vLLM server with reasoning enabled as follows:

services:
  vllm:
    image: vllm/vllm-openai:latest
    command: --model Qwen/Qwen3-30B-A3B-FP8 --enable-reasoning --reasoning-parser deepseek_r1
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    ports:
      - "8000:8000"
    ipc: host
    restart: unless-stopped

Now, I can use openai.OpenAI client or requests module to get completions, but is it possible to enable or disable reasoning on-demand? The reasoning seems to be always enabled.

Thanks for the excellent models!

abhinavkulkarni changed discussion title from Possibility of enabling and disabling reasoning on-demand with remote vLLM server? to Enabling or disabling reasoning on-demand with remote vLLM server?
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment