An LLM pre-training dataset containing only public domain and openly licensed text
Nikhil Kandpal
nkandpa2
AI & ML interests
None yet
Organizations
models
7
nkandpa2/comma-v0.1-checkpoints
Updated
•
2
nkandpa2/comma-v0.1-stage2
7B
•
Updated
•
3
nkandpa2/comma-v0.1-stage1
7B
•
Updated
•
3
nkandpa2/comma-v0.1-checkpoint-hf
7B
•
Updated
•
5
nkandpa2/comma-v0.1-ablation-hf
2B
•
Updated
•
3
nkandpa2/comma-loss-test
Text Generation
•
2B
•
Updated
•
2
nkandpa2/Llama_3.2_1B__alpaca_finetune
Updated
•
2
datasets
45
nkandpa2/code_dates_sorted
Viewer
•
Updated
•
218M
•
11
nkandpa2/oer_dates_sorted
Viewer
•
Updated
•
646k
•
8
nkandpa2/audio_dates_sorted
Viewer
•
Updated
•
1.13M
•
2
nkandpa2/forum_dates_sorted
Viewer
•
Updated
•
64.7M
•
3
nkandpa2/webtext_dates_sorted
Viewer
•
Updated
•
51.2M
•
5
nkandpa2/wiki_dates_sorted
Viewer
•
Updated
•
283M
•
5
nkandpa2/gov_dates_sorted
Viewer
•
Updated
•
19.6M
•
6
nkandpa2/scientific_papers_dates_sorted
Viewer
•
Updated
•
13M
•
2
nkandpa2/all_dates_sorted
Viewer
•
Updated
•
652M
•
109
nkandpa2/all_dates
Viewer
•
Updated
•
652M
•
3