Jesse Dodge's picture

2 2

Jesse Dodge

JesseDodge

·

https://jessedodge.github.io/

JesseDodge

AI & ML interests

Reproducibility and Efficiency in NLP and ML.

Recent Activity

upvoted a paper 10 months ago

DataDecide: How to Predict Best Pretraining Data with Small Experiments

upvoted a paper 10 months ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

authored a paper 10 months ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

View all activity

Organizations

None yet

upvoted 2 papers 10 months ago

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Paper • 2504.11393 • Published Apr 15, 2025 • 18

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Paper • 2504.07096 • Published Apr 9, 2025 • 77

authored a paper 10 months ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Paper • 2504.07096 • Published Apr 9, 2025 • 77

authored 2 papers almost 2 years ago

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Paper • 2402.00159 • Published Jan 31, 2024 • 65

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85

authored 3 papers about 2 years ago

AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

Paper • 2401.06408 • Published Jan 12, 2024 • 1

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Paper • 2312.10253 • Published Dec 15, 2023 • 8

Paloma: A Benchmark for Evaluating Language Model Fit

Paper • 2312.10523 • Published Dec 16, 2023 • 13

authored a paper over 2 years ago

Evaluating the Social Impact of Generative AI Systems in Systems and Society

Paper • 2306.05949 • Published Jun 9, 2023 • 9