Running on Zero 22 22 VLM Object Understanding 🦀 Explore object detection, visual grounding, keypoint Detecti
view post Post 1553 Updated my HF Space for vibe testing smol VLMs on object detection, visual grounding, keypoint detection & counting! 👓🆕 Compare Qwen2.5 VL 3B vs Moondream 2B side-by-side with annotated images & text outputs.Try examples or test your own images! 🏃📱Space: sergiopaniego/vlm_object_understanding See translation 🚀 5 5 + Reply
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published 15 days ago • 26
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 116
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 108