LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm Paper • 2502.19103 • Published about 1 month ago • 1
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 99
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model Paper • 2410.13639 • Published Oct 17, 2024 • 17
MMRA: A Benchmark for Multi-granularity Multi-image Relational Association Paper • 2407.17379 • Published Jul 24, 2024 • 2
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval Paper • 2401.13478 • Published Jan 24, 2024 • 1
OmniBench: Towards The Future of Universal Omni-Language Models Paper • 2409.15272 • Published Sep 23, 2024 • 29
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm Paper • 2408.08072 • Published Aug 15, 2024 • 35