zwgao's picture

zwgao

zwgao

·

AI & ML interests

None yet

Organizations

upvoted a paper 2 months ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 133

upvoted 3 papers 6 months ago

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Paper • 2512.10534 • Published Dec 11, 2025 • 32

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Paper • 2512.10756 • Published Dec 11, 2025 • 35

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Paper • 2512.10739 • Published Dec 11, 2025 • 47

upvoted 2 papers 8 months ago

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Paper • 2510.11027 • Published Oct 13, 2025 • 23

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

Paper • 2510.11341 • Published Oct 13, 2025 • 36

upvoted a paper 9 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 224

upvoted a paper 12 months ago

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

Paper • 2506.00123 • Published May 30, 2025 • 35

upvoted 5 papers about 1 year ago

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Paper • 2505.23762 • Published May 29, 2025 • 45

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Paper • 2504.15279 • Published Apr 21, 2025 • 78

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 311

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published Mar 25, 2025 • 51

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13, 2025 • 36

upvoted 3 papers over 1 year ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 38

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 161

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87

upvoted a collection almost 2 years ago

InternVL2.0

Expanding Performance Boundaries of Open-Source MLLM • 13 items • Updated Mar 2 • 89