remyxai/SpaceQwen2.5-VL-3B-Instruct

28 days ago

Hi, how well does this model work? Have you tested it. I'm interested in using this for my thesis.

Remyx AI org 28 days ago

Hi @yveeckh
This model is both faster and better than the 13B SpaceLLaVA, which are among the few VLMs that respond with distance estimates between objects in a scene. It's the only model scoring above zero on ZeroBench.

To check its capabilities in tasks of your interest, you can try chatting with the bot in discord. I can follow with more scores using vlm-evaluation.

Generally, I'd expect this fine-tune to perform on par with the Qwen2.5-VL base model on most benchmarks, and perhaps a little better in spatial understanding/reasoning ones like VSR or GQA

We can build a benchmark designed to assess the model's ability to accurately estimate distances, perhaps using a dataset like NYUv2 with sensor data for ground truth depth estimation.

salma-remyx

Remyx AI org 23 days ago

Looks like VLMEvalKit supports evaluation with a Quantitative Spatial Reasoning benchmark QSpatial
https://github.com/open-compass/VLMEvalKit/blob/ac1beb8bf164174393219a6e06220b8d3a5427b1/vlmeval/dataset/image_vqa.py#L1133

Planning to add evaluation step to VQASynth: https://github.com/remyxai/VQASynth/issues/49

salma-remyx

Remyx AI org 19 days ago

Hi @yveeckh
Following up to recommend this model: https://huggingface.co/remyxai/SpaceThinker-Qwen2.5VL-3B

In the model card, you'll find evaluation results using Q-Spatial-Bench: https://huggingface.co/datasets/andrewliao11/Q-Spatial-Bench

As I compare SpaceThinker with SpaceQwen and SpaceLLaVA, I find SpaceThinker is much better in task completion.

remyxai
/

SpaceQwen2.5-VL-3B-Instruct

Performance