(discord https://discord.gg/DUzP7CXqJt , https://discord.gg/jzwR7jFfSB) Website: https://calmacatai.draklor.ru

License

This model is licensed under the MIT License.

CalmaCatLM-1.5-mini

🚧 Experimental Under-Training Model (~12M parameters) based on a custom 12-layer/12-head Transformer architecture.
Primarily supports English πŸ‡¬πŸ‡§. This is my third model.

πŸ“– Description

CalmaCatLM is an experimental generative language model designed for text generation and dialogue tasks.
The main goal of this project is to test the full pipeline: from implementing the architecture and training from scratch to uploading models to the Hugging Face Hub.

βš™οΈ Model Details

  • Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)
  • Model size: ~12M parameters #
  • Training Approach: Pre-trained from scratch on My dataset
  • Languages: Primarily Russian
  • License: MIT

πŸ‹οΈ Training Details

  • Dataset: My
  • Hardware: Single AMD RX 7700 XT (12GB VRAM)
  • Training Status: Very early checkpoint (Under-trained)
  • Epochs: 100
  • Batch size: 32
  • Optimizer: AdamW, lr = 3e-4
  • Max sequence length: 128 tokens
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support