Evaluation - a OliP Collection

OliP 's Collections

NewGen small LMs

Leading Leaderboards

2024 Papers of the year

2023 (and before) Papers of the Year

Vision-Language

Audio

Special LMs <10B

Coding

Evaluation

updated Sep 25, 2024

Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5, 2024 • 30
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

Paper • 2409.12640 • Published Sep 19, 2024 • 2
openai/MMMLU

Viewer • Updated Oct 16, 2024 • 393k • 11.5k • 499
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 42