MiroThinker-v0.1 Collection High performance in deep research and tool use. • 7 items • Updated about 12 hours ago • 32
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5 • 20
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published May 29 • 46
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21 • 66
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 286
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 37
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization Paper • 2503.01328 • Published Mar 3 • 16
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Jul 14 • 63
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation Paper • 2111.02394 • Published Nov 3, 2021 • 2
V2PE Collection Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding • 3 items • Updated Apr 20 • 3
InternVL Adaptation Collection Adaptation Models for Specific Domains • 12 items • Updated Apr 20 • 1
InternVL1.5 Collection A Pioneering Open-Source Alternative to GPT-4V • 8 items • Updated Apr 20 • 10
InternVL2.5-MPO Collection Enhancing the Reasoning Ability of MLLMs via Mixed Preference Optimization • 16 items • Updated Apr 20 • 24
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper • 2412.08443 • Published Dec 11, 2024 • 39