xiaoqijian
mx1024
ยท
AI & ML interests
None yet
Recent Activity
authored
a paper
28 days ago
Stress Testing Generalization: How Minor Modifications Undermine Large
Language Model Performance
authored
a paper
28 days ago
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning
Capabilities Through Evaluation Design