Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations Paper • 2508.09789 • Published 7 days ago • 4
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published 6 days ago • 3
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents Paper • 2508.04038 • Published 15 days ago • 1
MultiRef: Controllable Image Generation with Multiple Visual References Paper • 2508.06905 • Published 12 days ago • 13
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Paper • 2508.14041 • Published 1 day ago • 40
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published 14 days ago • 65
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward Paper • 2508.12800 • Published 3 days ago
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends Paper • 2508.11548 • Published 5 days ago • 5
Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge Paper • 2508.08777 • Published 9 days ago • 10
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer Paper • 2508.09131 • Published 8 days ago • 9