Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
yicui 's Collections
Mechanistic
Coding
Benchmark
Training
ICL
Architecture
RL
TDD
Theory
Instructions

Benchmark

updated Nov 13, 2024
Upvote
-

  • Law of the Weakest Link: Cross Capabilities of Large Language Models

    Paper • 2409.19951 • Published Sep 30, 2024 • 55

  • Multi-lingual Evaluation of Code Generation Models

    Paper • 2210.14868 • Published Oct 26, 2022

  • ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

    Paper • 2410.05080 • Published Oct 7, 2024 • 21

  • LongGenBench: Long-context Generation Benchmark

    Paper • 2410.04199 • Published Oct 5, 2024 • 21

  • Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

    Paper • 2410.07985 • Published Oct 10, 2024 • 33

  • UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

    Paper • 2410.14059 • Published Oct 17, 2024 • 61

  • Efficacy of Synthetic Data as a Benchmark

    Paper • 2409.11968 • Published Sep 18, 2024 • 1
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs