amanuelbyte
/

qwen-amharic

Model card Files Files and versions

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

qwen-amharic

Private Amharic GPT-style model checkpoint.

Environment

Device: CUDA
GPU: Tesla P100-PCIE-16GB
Memory: 17.1 GB
🌱 Set all seeds to 42

Model Configuration

Architecture: 384d, 6L, 8H, 1536ff
Training: 20000 steps, batch size 24
Data: 500,000 tokens, seq_len 512

Data Processing

🔄 Processing new data (will cache for future use)
Loaded 2000 documents (skipped 0 empty/None)
Tokenizing texts...
Using 154,226 tokens
💾 Cached data to data_cache/tokenized_data_2000_500000.pkl
📊 Dataset: 138343 train, 15371 val samples

Training

🚀 Training Small model with Muon optimizer
🌱 Set all seeds to 42
📊 Total parameters: 32,863,680
Muon parameters: 13,271,040
AdamW parameters: 19,592,640

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support