This repository contains the ViLaSR-7B model as presented in Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing.

Please refer to the code https://github.com/AntResearchNLP/ViLaSR.

@misc{wu2025reinforcingspatialreasoningvisionlanguage,
      title={Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing}, 
      author={Junfei Wu and Jian Guan and Kaituo Feng and Qiang Liu and Shu Wu and Liang Wang and Wei Wu and Tieniu Tan},
      year={2025},
      eprint={2506.09965},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.09965}, 
}
Downloads last month
17,640
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inclusionAI/ViLaSR

Finetuned
(524)
this model
Quantizations
1 model

Dataset used to train inclusionAI/ViLaSR