Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Yi Cui's picture
23 19 1

Yi Cui

onekq
ounkounane's profile picture Samwinx's profile picture mondalsurojit's profile picture
ยท
https://onekq.ai
  • onekq_ai
  • onekq
  • yicui

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

posted an update 4 days ago
WebApp1K measures an oldest and simplest kind of task predated ChatGPT. It is code completion, you can also consider it a translation task mapping test spec into code. It requires no conversation, reasoning (which helps sometimes), or RL. I don't think it is on the roadmap of top labs. Otherwise, you can't explain why Claude 4 has the same 70+ score on SweBench, which is way more challenging than this benchmark. Neither do I encourage model builders to optimize towards my benchmark, which in itself won't be too hard to top the leaderboard. I just argue that we're still in a very early phase. What I witness now is still the same pattern: the dropping of generic models strategically optimized towards famous benchmarks. Meanwhile, agent builders (top labs and startups alike) painfully prompt these models to follow their expectations, and pray they won't drift overnight.
updated a Space 7 days ago
onekq-ai/README
posted an update 7 days ago
GPT OSS is as of now the top open source model, whose performance is very close to Claude and GPT-5, and above all other models. https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard
View all activity

Organizations

MLX Community's profile picture ONEKQ AI's profile picture

onekq 's models

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs