AblationBench Collection This is a collection of datasets used to evaluate language models in the task of ablation planning in empirical AI research. • 4 items • Updated 30 days ago