Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
riotu-lab 's Collections
SAND: Large-Scale Synthetic Arabic OCR Dataset
Aranizer | Arabic Tokenization with SentencePiece & PBE
ArabianLLM Series | Native Arabic Large Language Models

Aranizer | Arabic Tokenization with SentencePiece & PBE

updated Aug 25, 2024

Collection of Arabic Tokenizers with different sizes based on SentencePiece & PBE Encodings suitable for training LLMs

Upvote
2

  • riotu-lab/Aranizer-PBE-64k

    Updated Aug 26, 2024 • 1

  • riotu-lab/Aranizer-SP-32k

    Updated Aug 25, 2024 • 1

  • riotu-lab/Aranizer-PBE-32k

    Updated Aug 25, 2024 • 1

  • riotu-lab/Aranizer-SP-64k

    Updated Aug 25, 2024 • 1

  • riotu-lab/Aranizer-SP-86k

    Updated Aug 25, 2024

  • riotu-lab/Aranizer-PBE-86k

    Updated Aug 25, 2024
Upvote
2
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs