EvalPlus

university

https://evalplus.github.io/

evalplus

Activity Feed

AI & ML interests

Evaluation of Languages Models on Code.

Recent Activity

nevetsaix authored a paper about 2 months ago

Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

ganler updated a Space about 1 year ago

evalplus/README

ganler authored a paper about 1 year ago

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

View all activity

nevetsaix

authored a paper about 2 months ago

Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

Paper • 2511.13646 • Published Nov 17, 2025 • 8

ganler

updated a Space about 1 year ago

README

🔥

ganler

authored 3 papers about 1 year ago

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22, 2024 • 48

Evaluating Language Models for Efficient Code Generation

Paper • 2408.06450 • Published Aug 12, 2024

Learning Code Preference via Synthetic Evolution

Paper • 2410.03837 • Published Oct 4, 2024

ganler

updated a dataset about 1 year ago

evalplus/evalperf

Viewer • Updated Oct 17, 2024 • 120 • 1.3k • 3

nevetsaix

authored 7 papers over 1 year ago

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Paper • 2305.01210 • Published May 2, 2023 • 3

Conversational Automated Program Repair

Paper • 2301.13246 • Published Jan 30, 2023

Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM

Paper • 2403.19114 • Published Mar 28, 2024 • 1

A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

Paper • 2404.17153 • Published Apr 26, 2024

Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1, 2024 • 65

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

Paper • 2309.00608 • Published Sep 1, 2023 • 2

Universal Fuzzing via Large Language Models

Paper • 2308.04748 • Published Aug 9, 2023

ganler

updated a dataset over 1 year ago

evalplus/humanevalplus

Viewer • Updated May 1, 2024 • 164 • 18.2k • 18

ganler

authored a paper over 1 year ago

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

Paper • 2404.15247 • Published Apr 23, 2024 • 3

ganler

updated a dataset over 1 year ago

evalplus/mbppplus

Viewer • Updated Apr 17, 2024 • 378 • 9.97k • 14

ganler

authored 4 papers almost 2 years ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 152

NeuRI: Diversifying DNN Generation via Inductive Rule Inference

Paper • 2302.02261 • Published Feb 4, 2023 • 3

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Paper • 2305.01210 • Published May 2, 2023 • 3

NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers

Paper • 2207.13066 • Published Jul 26, 2022

AI & ML interests

Recent Activity

Team members 2

evalplus's activity

README