JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks Paper • 2404.03027 • Published Apr 3, 2024 • 3
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset Paper • 2411.03554 • Published Nov 5, 2024
Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent Paper • 2510.06607 • Published 16 days ago • 3
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Paper • 2510.08759 • Published 14 days ago • 44