corgi-qwen3-vl-demo / README.md
dung-vpt-uney
Deploy latest CoRGI Gradio demo
fe542a6
|
raw
history blame
1.68 kB
metadata
title: CoRGI Qwen3-VL Demo
emoji: 🐶
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.41.1
app_file: app.py
pinned: false
license: apache-2.0

CoRGI Qwen3-VL Demo

This Space showcases the CoRGI reasoning pipeline powered entirely by Qwen/Qwen3-VL-8B-Thinking.
Upload an image, ask a visual question, and the app will:

  1. Generate structured reasoning steps with visual-verification flags.
  2. Request region-of-interest evidence for steps that require vision.
  3. Synthesize a grounded final answer.

Running Locally

pip install -r requirements.txt
python examples/demo_qwen_corgi.py \
  --model-id Qwen/Qwen3-VL-8B-Thinking \
  --max-steps 3 \
  --max-regions 3

To launch the Gradio demo locally:

python app.py

Configuration Notes

  • The Space queues requests sequentially on cpu-basic (ZeroGPU) hardware.
  • Set the CORGI_QWEN_MODEL environment variable to try another Qwen3-VL checkpoint (for example, Qwen/Qwen3-VL-4B-Instruct).
  • max_steps and max_regions sliders control how many reasoning steps and ROI candidates the model returns.

UI Overview

  • Chain of Thought: Displays the structured reasoning steps with vision flags, alongside the exact prompt/response sent to the model.
  • ROI Extraction: Shows the source image with every grounded bounding box plus per-evidence crops, and lists the prompts used for each verification step.
  • Evidence Descriptions: Summarises each grounded region (bbox, description, confidence) with the associated ROI prompts.
  • Answer Synthesis: Highlights the final answer, supporting context, and the synthesis prompt/response pair.