File size: 858 Bytes
12e9427
 
 
 
 
 
 
89f84f2
12e9427
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
datasets:
- AntResearchNLP/ViLaSR-data
language:
- en
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: image-text-to-text
---


This repository contains the ViLaSR-7B model as presented in [Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing](https://arxiv.org/abs/2506.09965).

Please refer to the code https://github.com/AntResearchNLP/ViLaSR.

```
@misc{wu2025reinforcingspatialreasoningvisionlanguage,
      title={Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing}, 
      author={Junfei Wu and Jian Guan and Kaituo Feng and Qiang Liu and Shu Wu and Liang Wang and Wei Wu and Tieniu Tan},
      year={2025},
      eprint={2506.09965},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.09965}, 
}
```