This repository contains the ViLaSR-7B model as presented in Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing.
Please refer to the code https://github.com/AntResearchNLP/ViLaSR.
@misc{wu2025reinforcingspatialreasoningvisionlanguage,
title={Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing},
author={Junfei Wu and Jian Guan and Kaituo Feng and Qiang Liu and Shu Wu and Liang Wang and Wei Wu and Tieniu Tan},
year={2025},
eprint={2506.09965},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.09965},
}
- Downloads last month
- 17,640
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support