File size: 858 Bytes
12e9427 89f84f2 12e9427 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
---
datasets:
- AntResearchNLP/ViLaSR-data
language:
- en
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: image-text-to-text
---
This repository contains the ViLaSR-7B model as presented in [Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing](https://arxiv.org/abs/2506.09965).
Please refer to the code https://github.com/AntResearchNLP/ViLaSR.
```
@misc{wu2025reinforcingspatialreasoningvisionlanguage,
title={Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing},
author={Junfei Wu and Jian Guan and Kaituo Feng and Qiang Liu and Shu Wu and Liang Wang and Wei Wu and Tieniu Tan},
year={2025},
eprint={2506.09965},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.09965},
}
``` |