Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning Paper • 1705.11168 • Published May 31, 2017
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters Paper • 2401.06408 • Published Jan 12, 2024 • 1
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31, 2024 • 61
Evaluating Language Model Math Reasoning via Grounding in Educational Curricula Paper • 2408.04226 • Published Aug 8, 2024