StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published 10 days ago • 42
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published 15 days ago • 50
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published 17 days ago • 77
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings Paper • 2506.23115 • Published 18 days ago • 36
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published 16 days ago • 185
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published 29 days ago • 37
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published 29 days ago • 37
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published 29 days ago • 37 • 2
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published about 1 month ago • 254
Are Vision-Language Models Truly Understanding Multi-vision Sensor? Paper • 2412.20750 • Published Dec 30, 2024 • 20
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28 • 42
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing Paper • 2505.21600 • Published May 27 • 71