VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published 29 days ago • 42
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model Paper • 2504.10068 • Published 25 days ago • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 25 days ago • 84
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published 25 days ago • 12
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Paper • 2504.11343 • Published 24 days ago • 16
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors Paper • 2504.11427 • Published 24 days ago • 18
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published 24 days ago • 60
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation Paper • 2504.09454 • Published 26 days ago • 12
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published 28 days ago • 54
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published 22 days ago • 39
DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging Paper • 2504.12364 • Published 23 days ago • 21
Iterative Self-Training for Code Generation via Reinforced Re-Ranking Paper • 2504.09643 • Published 26 days ago • 34
DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published 24 days ago • 17
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution Paper • 2504.09566 • Published 26 days ago • 10
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference Paper • 2504.10326 • Published 25 days ago • 25
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework Paper • 2504.12395 • Published 23 days ago • 17
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Paper • 2504.13055 • Published 22 days ago • 19
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Paper • 2504.10449 • Published 25 days ago • 11
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models Paper • 2504.05303 • Published Apr 7 • 5
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments Paper • 2504.06827 • Published 30 days ago
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation Paper • 2504.14899 • Published 18 days ago • 20
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Paper • 2504.13816 • Published 21 days ago • 17
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Paper • 2504.13203 • Published 24 days ago • 31
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space Paper • 2504.13835 • Published 21 days ago • 36
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs Paper • 2504.14655 • Published 19 days ago • 19
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published 17 days ago • 63
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published 17 days ago • 20
Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published 17 days ago • 15
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning Paper • 2504.16080 • Published 17 days ago • 15
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model Paper • 2504.15843 • Published 17 days ago • 18
I-Con: A Unifying Framework for Representation Learning Paper • 2504.16929 • Published 16 days ago • 30
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published 18 days ago • 73
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published 15 days ago • 29
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper • 2504.17502 • Published 15 days ago • 54
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper • 2504.17789 • Published 15 days ago • 23
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs Paper • 2504.17768 • Published 15 days ago • 12
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published 14 days ago • 41
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models Paper • 2504.15716 • Published 17 days ago • 9
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark Paper • 2504.16427 • Published 16 days ago • 17
YoChameleon: Personalized Vision and Language Generation Paper • 2504.20998 • Published 10 days ago • 11
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities Paper • 2504.20734 • Published 10 days ago • 61
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published 9 days ago • 42
Sadeed: Advancing Arabic Diacritization Through Small Language Model Paper • 2504.21635 • Published 9 days ago • 54
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published 9 days ago • 37
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning Paper • 2504.18904 • Published 13 days ago • 9
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published 8 days ago • 48
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published 8 days ago • 39
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published 8 days ago • 21
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction Paper • 2504.21855 • Published 9 days ago • 12
Improving Editability in Image Generation with Layer-wise Memory Paper • 2505.01079 • Published 7 days ago • 26
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts Paper • 2504.21117 • Published 10 days ago • 24
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Paper • 2505.02471 • Published 4 days ago • 11
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Paper • 2505.01441 • Published 11 days ago • 31
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Paper • 2505.02625 • Published 4 days ago • 17
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published 6 days ago • 29
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published 3 days ago • 82
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published 4 days ago • 21