Diffusion Models Through a Global Lens: Are They Culturally Inclusive? Paper • 2502.08914 • Published Feb 13
When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts Paper • 2503.16826 • Published Mar 21
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language Paper • 2505.14395 • Published May 20 • 6
BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation Paper • 2506.00482 • Published May 31 • 8
Uncovering Factor Level Preferences to Improve Human-Model Alignment Paper • 2410.06965 • Published Oct 9, 2024
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages Paper • 2406.09948 • Published Jun 14, 2024 • 2
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Paper • 2412.10424 • Published Dec 10, 2024 • 2
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean Paper • 2403.06412 • Published Mar 11, 2024 • 3
CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset Paper • 2406.15481 • Published Jun 17, 2024 • 1
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean Paper • 2403.06412 • Published Mar 11, 2024 • 3