Performance
Hi, how well does this model work? Have you tested it. I'm interested in using this for my thesis.
Hi
@yveeckh
This model is both faster and better than the 13B SpaceLLaVA, which are among the few VLMs that respond with distance estimates between objects in a scene. It's the only model scoring above zero on ZeroBench.
To check its capabilities in tasks of your interest, you can try chatting with the bot in discord. I can follow with more scores using vlm-evaluation.
Generally, I'd expect this fine-tune to perform on par with the Qwen2.5-VL base model on most benchmarks, and perhaps a little better in spatial understanding/reasoning ones like VSR or GQA
We can build a benchmark designed to assess the model's ability to accurately estimate distances, perhaps using a dataset like NYUv2 with sensor data for ground truth depth estimation.
Looks like VLMEvalKit supports evaluation with a Quantitative Spatial Reasoning benchmark QSpatial
https://github.com/open-compass/VLMEvalKit/blob/ac1beb8bf164174393219a6e06220b8d3a5427b1/vlmeval/dataset/image_vqa.py#L1133
Planning to add evaluation step to VQASynth: https://github.com/remyxai/VQASynth/issues/49
Hi
@yveeckh
Following up to recommend this model: https://huggingface.co/remyxai/SpaceThinker-Qwen2.5VL-3B
In the model card, you'll find evaluation results using Q-Spatial-Bench: https://huggingface.co/datasets/andrewliao11/Q-Spatial-Bench
As I compare SpaceThinker with SpaceQwen and SpaceLLaVA, I find SpaceThinker is much better in task completion.