--- datasets: - AntResearchNLP/ViLaSR-data language: - en base_model: - Qwen/Qwen2.5-VL-7B-Instruct pipeline_tag: image-text-to-text --- This repository contains the ViLaSR-7B model as presented in [Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing](https://arxiv.org/abs/2506.09965). Please refer to the code https://github.com/AntResearchNLP/ViLaSR. ``` @misc{wu2025reinforcingspatialreasoningvisionlanguage, title={Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing}, author={Junfei Wu and Jian Guan and Kaituo Feng and Qiang Liu and Shu Wu and Liang Wang and Wei Wu and Tieniu Tan}, year={2025}, eprint={2506.09965}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.09965}, } ```