English
text generation

BPE_GPT2_TinyStoriesV2_cleaned

BPE Tokenizer Model for dataset 'fhswf/TinyStoriesV2_cleaned'

Based on get-neo BPE Tokenizer, but with a smaller vocabulary. Trained with TinyStoriesV2.

  • Vocab Size: 1024
  • 256 Base chars
  • 1 extra Token: <|endoftext|>
  • 767 merges
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Dataset used to train fhswf/BPE_GPT2_TinyStoriesV2_cleaned_1024