Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 23
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 22
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published Nov 18, 2024 • 19
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models Paper • 2406.12257 • Published Jun 18, 2024
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 36
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
Improve Vision Language Model Chain-of-thought Reasoning Paper • 2410.16198 • Published Oct 21, 2024 • 26
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published Oct 3, 2024 • 52
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published Sep 30, 2024 • 56
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback Paper • 2302.12813 • Published Feb 24, 2023 • 1
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models Paper • 2304.09842 • Published Apr 19, 2023 • 1
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Paper • 2310.02255 • Published Oct 3, 2023 • 2
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding Paper • 2305.14232 • Published May 23, 2023