SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published 6 days ago • 78
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Paper • 2507.15028 • Published Jul 20 • 20
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 73
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 251
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16 • 41
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Paper • 2507.05920 • Published Jul 8 • 11
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 44
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 119
LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated Feb 21 • 62