Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 1 day ago • 40
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 38
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 38
Subject-driven Text-to-Image Generation via Apprenticeship Learning Paper • 2304.00186 • Published Apr 1, 2023
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Paper • 2406.13121 • Published Jun 19, 2024 • 2
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Paper • 2403.19651 • Published Mar 28, 2024 • 22
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers Paper • 2311.17136 • Published Nov 28, 2023 • 7
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces Paper • 2306.00245 • Published May 31, 2023
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Paper • 2403.19651 • Published Mar 28, 2024 • 22
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3, 2024 • 31
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3, 2024 • 31
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 44