New Papers - a ThreeSR Collection

ThreeSR 's Collections

New Papers

updated about 22 hours ago

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published 24 days ago • 16
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published 24 days ago • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published 25 days ago • 27
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published 27 days ago • 84
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published 27 days ago • 41
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Paper • 2503.08625 • Published 26 days ago • 26
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

Paper • 2503.03734 • Published Mar 5 • 1
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

Paper • 2503.10460 • Published 24 days ago • 27
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published 10 days ago • 21
ViLBench: A Suite for Vision-Language Process Reward Modeling

Paper • 2503.20271 • Published 12 days ago • 7
Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published 12 days ago • 23
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 12 days ago • 121
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published 12 days ago • 48
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Paper • 2503.02268 • Published Mar 4 • 10
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published about 1 month ago • 114
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published about 1 month ago • 25
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published 20 days ago • 15
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published 23 days ago • 18
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published 25 days ago • 33
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published 19 days ago • 45
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Paper • 2503.12797 • Published 21 days ago • 29
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published 6 days ago • 166
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs

Paper • 2504.00072 • Published 6 days ago • 7
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published 6 days ago • 58
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 83
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Paper • 2502.18906 • Published Feb 26 • 12
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 179