@salma-remyx on Hugging Face: "SpaceThinker-Qwen2.5VL-3B shows a 3B VLM can compete with closed, frontier…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

salma-remyx

posted an update 16 days ago

Post

1757

SpaceThinker-Qwen2.5VL-3B shows a 3B VLM can compete with closed, frontier APIs in quantitative spatial reasoning, a key capability for embodied AI applications like drones and robotics.

Check out how it stacks up against Gemini and OpenAI on Q-Spatial-Bench in the ModelCard. Includes .gguf, colab quickstart, docker images.

SpaceThinker adopts the Qwen2.5VL-3B architecture, fine-tuned on the SpaceThinker dataset of synthetic spatial reasoning traces, created with VQASynth

This model builds upon the SpaceLLaVA series of VLMs finetuned for enhanced spatial reasoning using synthetic data by adding test-time compute for multimodal thinking.

Model: remyxai/SpaceThinker-Qwen2.5VL-3B
Dataset: remyxai/SpaceThinker
Space: remyxai/SpaceThinker-Qwen2.5VL-3B
Code: https://github.com/remyxai/VQASynth
Discussion: open-r1/README#10

salma-remyx

14 days ago

Had a chance to replicate the comparison to include SpaceThinker, gpt-4o and gemini-pro-2.5

Quantitative spatial reasoning performance for the open-weight 3B is between that of gpt-4o and gemini-pro-2.5

https://huggingface.co/remyxai/SpaceThinker-Qwen2.5VL-3B#qspatial-comparison-table-42525

In this post

salma-remyx Salma Mayorquin