Qwen/Qwen3-30B-A3B · Enabling or disabling reasoning on-demand with remote vLLM server?

May 1

•

Hi,

I have brought up a remote vLLM server with reasoning enabled as follows:

services:
  vllm:
    image: vllm/vllm-openai:latest
    command: --model Qwen/Qwen3-30B-A3B-FP8 --enable-reasoning --reasoning-parser deepseek_r1
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    ports:
      - "8000:8000"
    ipc: host
    restart: unless-stopped

Now, I can use openai.OpenAI client or requests module to get completions, but is it possible to enable or disable reasoning on-demand? The reasoning seems to be always enabled.

Thanks for the excellent models!

abhinavkulkarni changed discussion title from Possibility of enabling and disabling reasoning on-demand with remote vLLM server? to Enabling or disabling reasoning on-demand with remote vLLM server? May 1

igorkan

14 days ago

•

edited 14 days ago

Hi, is there any follow-up on this? Have you managed to find a solution?

Upd: https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes