DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published Apr 15 • 18
ProX Dataset Collection a collection of pre-training corpora refined by ProX • 6 items • Updated Feb 14 • 7