License

This model is licensed under the MIT License.

CalmaCatLM-1.5-mini

🚧 Experimental Under-Training Model (~12M parameters) based on a custom 12-layer/12-head Transformer architecture.
Primarily supports English 🇬🇧. This is my third model.

📖 Description

CalmaCatLM is an experimental generative language model designed for text generation and dialogue tasks.
The main goal of this project is to test the full pipeline: from implementing the architecture and training from scratch to uploading models to the Hugging Face Hub.

⚙️ Model Details

Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)
Model size: ~12M parameters #
Training Approach: Pre-trained from scratch on My dataset
Languages: Primarily Russian
License: MIT

🏋️ Training Details

Dataset: My
Hardware: Single AMD RX 7700 XT (12GB VRAM)
Training Status: Very early checkpoint (Under-trained)
Epochs: 100
Batch size: 32
Optimizer: AdamW, lr = 3e-4
Max sequence length: 128 tokens

Downloads last month: -; Downloads are not tracked for this model. How to track