Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs Paper • 2509.01790 • Published 26 days ago • 4 • 1