Spaces:

tuandunghcmut
/

corgi-qwen3-vl-demo

Runtime error

App Files Files Community

corgi-qwen3-vl-demo / README.md

dung-vpt-uney

Deploy CoRGI demo - 2025-10-29 14:27:36

c1c7f1e 8 days ago

preview code

raw

history blame contribute delete

2.25 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: CoRGI Qwen3-VL Demo
emoji: 🐶
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.41.1
app_file: app.py
pinned: false
license: apache-2.0

CoRGI Qwen3-VL Demo

This Space showcases the CoRGI reasoning pipeline powered entirely by Qwen/Qwen3-VL-2B-Instruct.
Upload an image, ask a visual question, and the app will:

Generate structured reasoning steps with visual-verification flags.
Request region-of-interest evidence for steps that require vision.
Synthesize a grounded final answer.

Running Locally

pip install -r requirements.txt
python examples/demo_qwen_corgi.py \
  --model-id Qwen/Qwen3-VL-2B-Instruct \
  --max-steps 3 \
  --max-regions 3

To launch the Gradio demo locally:

python app.py

📚 Full Documentation

See docs/ folder for complete documentation:

🚀 Quick Start - Begin here!
📖 Usage Guide - How to use
🔧 Deployment - Deploy to HF Spaces
📊 Summary Report - Full overview

Configuration Notes

Model: Uses Qwen/Qwen3-VL-2B-Instruct (2B parameters, ~5GB VRAM)
Single GPU: Model loads on single GPU (cuda:0) to avoid memory fragmentation
Hardware: The Space runs on cpu-basic tier by default
Customization: Set CORGI_QWEN_MODEL environment variable to use a different checkpoint
Sliders: max_steps and max_regions control reasoning depth and ROI candidates

UI Overview

Chain of Thought: Displays the structured reasoning steps with vision flags, alongside the exact prompt/response sent to the model.
ROI Extraction: Shows the source image with every grounded bounding box plus per-evidence crops, and lists the prompts used for each verification step.
Evidence Descriptions: Summarises each grounded region (bbox, description, confidence) with the associated ROI prompts.
Answer Synthesis: Highlights the final answer, supporting context, and the synthesis prompt/response pair.
Performance: Reports per-stage timings (reasoning, ROI extraction, synthesis) plus overall latency so you can monitor ZeroGPU runtime limits.