Submitted by ztwang 124 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning · 9 authors 27 2
Submitted by junkang0909 113 Quantile Advantage Estimation for Entropy-Safe Reasoning · 6 authors 14 2
Submitted by taesiri 99 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing · 61 authors 45.2k 2
Submitted by P2333 63 Language Models Can Learn from Verbal Feedback Without Scalar Rewards Sea AI Lab 3
Submitted by hyun1905 62 ReviewScore: Misinformed Peer Review Detection with Large Language Models KAIST AI 2
Submitted by xiangan 35 LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training · 22 authors 433 3
Submitted by bltnynk 33 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping KAIST AI 2
Submitted by xl-zhao 31 PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning · 5 authors 101 5
Submitted by Wiselnn 30 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Intern Large Models 68 2
Submitted by lxxiao 30 MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning · 11 authors 32 3
Submitted by yuna0x0 23 See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation · 10 authors 14 2
Submitted by LordNoah 23 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios · 18 authors 12 2
Submitted by scikkk 21 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing LLMs for Reasoning 9 2
Submitted by Owen777 21 LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer W2GenAI Lab 189 3
Submitted by abdo-eldesokey 21 Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation · 4 authors 1 2
Submitted by ammarali32 20 COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning MTSAIR 2
Submitted by luzimu 18 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning · 8 authors 2
Submitted by yuhangzang 16 SPARK: Synergistic Policy And Reward Co-Evolving Framework Intern Large Models 20 2
Submitted by wuxiaojun 16 Think-on-Graph 3.0: Efficient and Adaptive LLM Reasoning on Heterogeneous Graphs via Multi-Agent Dual-Evolving Context Retrieval DataArcTech Ltd. 23 3
Submitted by JunkaiZ 16 Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training Scale AI 2
Submitted by maksimko123 14 TUN3D: Towards Real-World Scene Understanding from Unposed Images · 7 authors 14 2
Submitted by Orannue 11 UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Multimedia Intelligent Processing Group in Communication University of China 26 2
Submitted by taesiri 10 Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning · 16 authors 2
Submitted by dyong 8 WoW: Towards a World omniscient World model Through Embodied Interaction · 36 authors 2
Submitted by LordNoah 8 D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents · 13 authors 2
Submitted by taesiri 6 X-Streamer: Unified Human World Modeling with Audiovisual Interaction · 10 authors 3
Submitted by taesiri 5 FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing · 7 authors 4
Submitted by je1lee 4 ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models · 8 authors 2
Submitted by taesiri 3 Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation · 8 authors 2
Submitted by zhilinw 3 RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards NVIDIA 2
Submitted by pranjalchitale 3 The role of synthetic data in Multilingual, Multi-cultural AI systems: Lessons from Indic Languages Microsoft 2
Submitted by chen-yingfa 2 StateX: Enhancing RNN Recall via Post-training State Expansion · 6 authors 1 2
Submitted by msadat97 2 HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models · 3 authors 2
Submitted by prasannareddyp 2 X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning · 6 authors 7 2
Submitted by s-jse 2 CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition Stanford Open Virtual Assistant Lab (OVAL) 2
Submitted by rywang37 1 CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization Microsoft 2
Submitted by Julppe1 1 Finding 3D Positions of Distant Objects from Noisy Camera Movement and Semantic Segmentation Sequences · 3 authors 2
Submitted by NikolaiSkripko - Instruction-Following Evaluation in Function Calling for Large Language Models · 1 authors 1 3