BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation Paper โข 2506.00482 โข Published May 31 โข 8
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean Paper โข 2403.06412 โข Published Mar 11, 2024 โข 3
CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset Paper โข 2406.15481 โข Published Jun 17, 2024 โข 1