MIBench: Evaluating Multimodal Large Language Models over Multiple Images Paper • 2407.15272 • Published Jul 21, 2024 • 10
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization Paper • 2306.05087 • Published Jun 8, 2023 • 6