---
title: CoRGI Qwen3-VL Demo
emoji: 🐶
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: "5.41.1"
app_file: app.py
pinned: false
license: apache-2.0
---

# CoRGI Qwen3-VL Demo

This Space showcases the CoRGI reasoning pipeline powered entirely by **Qwen/Qwen3-VL-2B-Instruct**.  
Upload an image, ask a visual question, and the app will:

1. Generate structured reasoning steps with visual-verification flags.  
2. Request region-of-interest evidence for steps that require vision.  
3. Synthesize a grounded final answer.

## Running Locally

```bash
pip install -r requirements.txt
python examples/demo_qwen_corgi.py \
  --model-id Qwen/Qwen3-VL-2B-Instruct \
  --max-steps 3 \
  --max-regions 3
```

To launch the Gradio demo locally:

```bash
python app.py
```

## 📚 Full Documentation

See **[docs/](docs/)** folder for complete documentation:
- 🚀 **[Quick Start](docs/START_HERE.md)** - Begin here!
- 📖 **[Usage Guide](docs/USAGE_GUIDE.md)** - How to use
- 🔧 **[Deployment](docs/DEPLOY_NOW.md)** - Deploy to HF Spaces
- 📊 **[Summary Report](docs/SUMMARY_REPORT.md)** - Full overview

## Configuration Notes

- **Model**: Uses `Qwen/Qwen3-VL-2B-Instruct` (2B parameters, ~5GB VRAM)
- **Single GPU**: Model loads on single GPU (cuda:0) to avoid memory fragmentation
- **Hardware**: The Space runs on `cpu-basic` tier by default
- **Customization**: Set `CORGI_QWEN_MODEL` environment variable to use a different checkpoint
- **Sliders**: `max_steps` and `max_regions` control reasoning depth and ROI candidates

## UI Overview

- **Chain of Thought**: Displays the structured reasoning steps with vision flags, alongside the exact prompt/response sent to the model.  
- **ROI Extraction**: Shows the source image with every grounded bounding box plus per-evidence crops, and lists the prompts used for each verification step.  
- **Evidence Descriptions**: Summarises each grounded region (bbox, description, confidence) with the associated ROI prompts.  
- **Answer Synthesis**: Highlights the final answer, supporting context, and the synthesis prompt/response pair.
- **Performance**: Reports per-stage timings (reasoning, ROI extraction, synthesis) plus overall latency so you can monitor ZeroGPU runtime limits.