X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Paper • 2504.13203 • Published Apr 15 • 34
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Paper • 2504.01005 • Published Apr 1 • 16
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 24
STIV: Scalable Text and Image Conditioned Video Generation Paper • 2412.07730 • Published Dec 10, 2024 • 75
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 12
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models Paper • 2410.05269 • Published Oct 7, 2024 • 3
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding Paper • 2406.09411 • Published Jun 13, 2024 • 19
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21, 2024 • 54
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models Paper • 2401.13311 • Published Jan 24, 2024 • 12
VideoCon: Robust Video-Language Alignment via Contrast Captions Paper • 2311.10111 • Published Nov 15, 2023 • 9
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs Paper • 2311.05657 • Published Nov 9, 2023 • 32
AVIS: Autonomous Visual Information Seeking with Large Language Models Paper • 2306.08129 • Published Jun 13, 2023 • 5