Data, embedding, and index of MassiveDS by "Scaling Retrieval-Based Language Models with a Trillion-Token Datastore"
Rulin Shao
rulins
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 23 hours ago
S*: Test Time Scaling for Code Generation
updated
a model
8 days ago
dreamorg/llama_real_distill_bright_1000_step
published
a model
8 days ago
dreamorg/llama_real_distill_bright_1000_step
Organizations
Collections
1
models
3
datasets
10
rulins/DeepSeek-R1-Distill-Qwen-32B_NUMINA_train_amc_aime_merged_thoughts
Viewer
•
Updated
•
3.64k
•
30
rulins/DeepSeek-R1-Distill-Qwen-32B_NUMINA_train_amc_aime
Viewer
•
Updated
•
3.64k
•
313
•
1
rulins/MassiveDS-1.4T
Updated
•
2.17k
•
10
rulins/pes2o_v3
Viewer
•
Updated
•
150M
•
172
rulins/raw_data
Viewer
•
Updated
•
514M
•
1.75k
rulins/MassiveDS-1.4T-raw-data
Viewer
•
Updated
•
514M
•
253
•
6
rulins/mmlu_searched_results_from_massiveds
Viewer
•
Updated
•
33.5k
•
339
rulins/MassiveDS-140B
Viewer
•
Updated
•
3.08M
•
1.22k
•
6
rulins/FineWeb-Edu-1MT
Viewer
•
Updated
•
1k
•
60
rulins/FineWeb-Edu-1BT
Viewer
•
Updated
•
665k
•
52