pszemraj/fineweb-CC-MAIN-2024-10-insurance-700k-dedup-minified Viewer • Updated about 4 hours ago • 60k • 36
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published 9 days ago • 36
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published 4 days ago • 40