Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Recent Activity
View all activity
Organization Card
ššš
š§"raw" pretrained smol_llama checkpoints - WIP š§
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation ⢠0.1B ⢠Updated ⢠3.56k ⢠28 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation ⢠0.1B ⢠Updated ⢠1.16k ⢠6 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation ⢠0.2B ⢠Updated ⢠3.94k ⢠13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation ⢠0.1B ⢠Updated ⢠2.04k ⢠4
Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
š§"raw" pretrained smol_llama checkpoints - WIP š§
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation ⢠0.1B ⢠Updated ⢠3.56k ⢠28 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation ⢠0.1B ⢠Updated ⢠1.16k ⢠6 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation ⢠0.2B ⢠Updated ⢠3.94k ⢠13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation ⢠0.1B ⢠Updated ⢠2.04k ⢠4
models
56

BEE-spoke-data/tiny-random-MPNetForMaskedLM
Fill-Mask
ā¢
0.0B
ā¢
Updated
ā¢
96

BEE-spoke-data/wordpiece-tokenizer-32k-en_code-msp
Updated

BEE-spoke-data/wordpiece-tokenizer-32k-en_code-orig
Updated

BEE-spoke-data/bpe-tokenizer-32k-smolNeoX
Updated

BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
Summarization
ā¢
0.3B
ā¢
Updated
ā¢
165
ā¢
2

BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
Text Generation
ā¢
0.7B
ā¢
Updated
ā¢
9

BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2
Text Generation
ā¢
0.7B
ā¢
Updated
ā¢
8

BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e
Text Generation
ā¢
0.9B
ā¢
Updated
ā¢
68

BEE-spoke-data/tFINE-900m-instruct-orpo
Text Generation
ā¢
0.9B
ā¢
Updated
ā¢
69

BEE-spoke-data/smol_llama-220M-openhermes
Text Generation
ā¢
0.2B
ā¢
Updated
ā¢
1.16k
ā¢
5
datasets
82
BEE-spoke-data/SurvivorLib-Nanonets-OCR-s
Viewer
ā¢
Updated
ā¢
11.7k
ā¢
234
ā¢
3
BEE-spoke-data/SurvivorLib-rolmOCR
Viewer
ā¢
Updated
ā¢
13.3k
ā¢
170
ā¢
2
BEE-spoke-data/govdocs1-pdf-source
Viewer
ā¢
Updated
ā¢
235k
ā¢
757
ā¢
2
BEE-spoke-data/napierone-pdf-nanonets-s
Viewer
ā¢
Updated
ā¢
9.96k
ā¢
133
BEE-spoke-data/napierone-pdf-olmOCR
Viewer
ā¢
Updated
ā¢
19k
ā¢
43
BEE-spoke-data/LONGCOT-merged-1M
Viewer
ā¢
Updated
ā¢
1.7M
ā¢
117
ā¢
1
BEE-spoke-data/govdocs1-by-extension
Viewer
ā¢
Updated
ā¢
733k
ā¢
1.34k
ā¢
2
BEE-spoke-data/cosmopedia-v2-mincols
Viewer
ā¢
Updated
ā¢
39.1M
ā¢
71
ā¢
1
BEE-spoke-data/reddit-title-body-hf
Viewer
ā¢
Updated
ā¢
251M
ā¢
122
ā¢
4
BEE-spoke-data/bigpatent-all
Viewer
ā¢
Updated
ā¢
2.43M
ā¢
137