Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
vukrosic
/
essential-web-16k-tokenizer
like
0
License:
mit
Model card
Files
Files and versions
Community
main
essential-web-16k-tokenizer
Ctrl+K
Ctrl+K
1 contributor
History:
6 commits
vukrosic
Add essential_web_500k_tokens.pkl - Tokenized version (pickle format) of 500K chars from Essential-Web
4ca13e5
verified
about 1 month ago
.gitattributes
Safe
1.52 kB
initial commit
about 1 month ago
README.md
4.6 kB
Add tokenizer usage documentation
about 1 month ago
bpe_tokenizer_16k_n1000000.pkl
Safe
pickle
Pickle imports
No problematic imports detected
What is a pickle import?
194 kB
LFS
Upload bpe_tokenizer_16k_n1000000.pkl
about 1 month ago
essential_web_500k_text.txt
504 kB
Add essential_web_500k_text.txt - 500K characters of raw text from Essential-Web dataset
about 1 month ago
essential_web_500k_tokens.pkl
pickle
Pickle imports
No problematic imports detected
What is a pickle import?
350 kB
LFS
Add essential_web_500k_tokens.pkl - Tokenized version (pickle format) of 500K chars from Essential-Web
about 1 month ago
essential_web_500k_tokens.txt
538 kB
Add essential_web_500k_tokens.txt - Tokenized version (one token per line) of 500K chars from Essential-Web
about 1 month ago