YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
qwen-amharic
Private Amharic GPT-style model checkpoint.
Environment
- Device: CUDA
- GPU: Tesla P100-PCIE-16GB
- Memory: 17.1 GB
- π± Set all seeds to 42
Model Configuration
- Architecture: 384d, 6L, 8H, 1536ff
- Training: 20000 steps, batch size 24
- Data: 500,000 tokens, seq_len 512
Data Processing
- π Processing new data (will cache for future use)
- Loaded 2000 documents (skipped 0 empty/None)
- Tokenizing texts...
- Using 154,226 tokens
- πΎ Cached data to data_cache/tokenized_data_2000_500000.pkl
- π Dataset: 138343 train, 15371 val samples
Training
- π Training Small model with Muon optimizer
- π± Set all seeds to 42
- π Total parameters: 32,863,680
- Muon parameters: 13,271,040
- AdamW parameters: 19,592,640
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support