Image-Text-to-Text
Transformers
Safetensors
GGUF
English
qwen2_5_vl
remyx
qwen2.5-vl
spatial-reasoning
multimodal
vlm
vqasynth
thinking
reasoning
test-time-compute
robotics
embodied-ai
quantitative-spatial-reasoning
distance-estimation
visual-question-answering
conversational
Eval Results
text-generation-inference
Ctrl+K