The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published 5 days ago • 58
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 5 days ago • 51
Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published 5 days ago • 38
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Paper • 2504.16030 • Published 5 days ago • 27
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published 7 days ago • 23
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper • 2504.15415 • Published 6 days ago • 20
Personalized Text-to-Image Generation with Auto-Regressive Models Paper • 2504.13162 • Published 10 days ago • 17
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Paper • 2504.13820 • Published 9 days ago • 16
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published 5 days ago • 15
Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published 5 days ago • 14
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning Paper • 2504.16080 • Published 5 days ago • 13
RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild Paper • 2504.14977 • Published 6 days ago • 9
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published 5 days ago • 15
MR. Video: "MapReduce" is the Principle for Long Video Understanding Paper • 2504.16082 • Published 5 days ago • 5
Progent: Programmable Privilege Control for LLM Agents Paper • 2504.11703 • Published 11 days ago • 5
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property Paper • 2504.15524 • Published 5 days ago • 4
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting Paper • 2504.15485 • Published 5 days ago • 5