Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
xiaoqijian's picture
4 2

xiaoqijian

mx1024
·

AI & ML interests

None yet

Recent Activity

authored a paper 28 days ago
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance
authored a paper 28 days ago
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
upvoted a paper 28 days ago
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
View all activity

Organizations

OpenReasoning's profile picture

authored 2 papers 28 days ago

Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance

Paper • 2502.12459 • Published Feb 18

Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design

Paper • 2506.04734 • Published about 1 month ago • 19
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs