YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

qwen-amharic

Private Amharic GPT-style model checkpoint.

Environment

  • Device: CUDA
  • GPU: Tesla P100-PCIE-16GB
  • Memory: 17.1 GB
  • 🌱 Set all seeds to 42

Model Configuration

  • Architecture: 384d, 6L, 8H, 1536ff
  • Training: 20000 steps, batch size 24
  • Data: 500,000 tokens, seq_len 512

Data Processing

  • πŸ”„ Processing new data (will cache for future use)
  • Loaded 2000 documents (skipped 0 empty/None)
  • Tokenizing texts...
  • Using 154,226 tokens
  • πŸ’Ύ Cached data to data_cache/tokenized_data_2000_500000.pkl
  • πŸ“Š Dataset: 138343 train, 15371 val samples

Training

  • πŸš€ Training Small model with Muon optimizer
  • 🌱 Set all seeds to 42
  • πŸ“Š Total parameters: 32,863,680
  • Muon parameters: 13,271,040
  • AdamW parameters: 19,592,640
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support