VisionReasoner-7B from the Seg-Zero Framework

This repository contains the VisionReasoner-7B model, developed as part of the novel Seg-Zero framework, presented in the paper Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. This model is also associated with the paper VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning.

Code: https://github.com/dvlab-research/Seg-Zero Project page: https://github.com/dvlab-research/Seg-Zero

Description

Seg-Zero is a novel framework that demonstrates remarkable generalizability and derives explicit chain-of-thought reasoning through cognitive reinforcement for reasoning segmentation. This VisionReasoner-7B model employs a decoupled architecture consisting of a reasoning model and a segmentation model. The reasoning model interprets user intentions, generates explicit reasoning chains, and produces positional prompts, which are subsequently used by the segmentation model to generate precise pixel-level masks.

Trained exclusively via reinforcement learning with GRPO and without explicit reasoning data, Seg-Zero achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities. Experiments show that Seg-Zero-7B achieves a zero-shot performance of 57.5 on the ReasonSeg benchmark, surpassing the prior LISA-7B by 18%. This significant improvement highlights Seg-Zero's ability to generalize across domains while presenting an explicit reasoning process.

Usage

You can load and use this model with the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# load model
model = AutoModelForCausalLM.from_pretrained("Ricky06662/VisionReasoner-7B")
tokenizer = AutoTokenizer.from_pretrained("Ricky06662/VisionReasoner-7B")

For full inference examples, including image processing and input formatting, please refer to the project's GitHub repository.

Citation

If you find our work helpful or inspiring, please feel free to cite our papers:

@article{liu2025segzero,
  title        = {Seg-Zero: Reasoning-Chain Guided  Segmentation via Cognitive Reinforcement},
  author       = {Liu, Yuqi and Peng, Bohao and Zhong, Zhisheng and Yue, Zihao and Lu, Fanbin and Yu, Bei and Jia, Jiaya},
  journal      = {arXiv preprint arXiv:2503.06520},
  year         = {2025}
}

@article{liu2025visionreasoner,
  title        = {VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning},
  author       = {Liu, Yuqi and Qu, Tianyuan and Zhong, Zhisheng and Peng, Bohao and Liu, Shu and Yu, Bei and Jia, Jiaya},
  journal = {arXiv preprint arXiv:2505.12081},
  year         = {2025}
}
Downloads last month
1,691
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ricky06662/VisionReasoner-7B

Quantizations
1 model

Datasets used to train Ricky06662/VisionReasoner-7B