|
--- |
|
datasets: |
|
- AntResearchNLP/ViLaSR-data |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen2.5-VL-7B-Instruct |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
|
|
This repository contains the ViLaSR-7B model as presented in [Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing](https://arxiv.org/abs/2506.09965). |
|
|
|
Please refer to the code https://github.com/AntResearchNLP/ViLaSR. |
|
|
|
``` |
|
@misc{wu2025reinforcingspatialreasoningvisionlanguage, |
|
title={Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing}, |
|
author={Junfei Wu and Jian Guan and Kaituo Feng and Qiang Liu and Shu Wu and Liang Wang and Wei Wu and Tieniu Tan}, |
|
year={2025}, |
|
eprint={2506.09965}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2506.09965}, |
|
} |
|
``` |