Commit History

Add essential_web_500k_tokens.pkl - Tokenized version (pickle format) of 500K chars from Essential-Web
4ca13e5
verified

vukrosic commited on

Add essential_web_500k_tokens.txt - Tokenized version (one token per line) of 500K chars from Essential-Web
989de3a
verified

vukrosic commited on

Add essential_web_500k_text.txt - 500K characters of raw text from Essential-Web dataset
eadb605
verified

vukrosic commited on

Add tokenizer usage documentation
d397346
verified

vukrosic commited on

Upload bpe_tokenizer_16k_n1000000.pkl
a0e8176
verified

vukrosic commited on

initial commit
20cee74
verified

vukrosic commited on