Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
OliP 's Collections
NewGen small LMs
Leading Leaderboards
2024 Papers of the year
2023 (and before) Papers of the Year
LLM Deployment
Vision-Language
Long-Context
Audio
Special LMs <10B
🌶️ Spaces
Evaluation
Applications
Coding

Evaluation

updated Sep 25, 2024
Upvote
-

  • Self-Taught Evaluators

    Paper • 2408.02666 • Published Aug 5, 2024 • 30

  • Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

    Paper • 2409.12640 • Published Sep 19, 2024 • 2

  • openai/MMMLU

    Viewer • Updated Oct 16, 2024 • 393k • 5.69k • 484

  • HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

    Paper • 2409.16191 • Published Sep 24, 2024 • 43
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs