π« Train Your Own Small Language Model
A minimal toolkit for training and using small language models with the Muon optimizer.
π Quick Start
Option 1: Google Colab (No Setup Required)
Click the badge above to run everything in your browser with free GPU access!
Option 2: Local Setup
# Clone and setup
git clone https://github.com/vukrosic/build-and-release-your-own-llm
cd build-and-release-your-own-llm
python setup.py # Installs requirements and creates .env file
π― Three Ways to Use This Project
1. π Quick Start - Use My Pre-trained Model
Want to try text generation immediately?
# Install dependencies
pip install -r requirements.txt
# Run inference with my pre-trained model
python inference.py
The script will:
- Show available checkpoints from
vukrosic/blueberry-1
- Download the model automatically
- Let you generate text interactively
No setup required! The model downloads automatically.
2. ποΈ Train Your Own Model
Want to train from scratch?
# Install dependencies
pip install -r requirements.txt
# Start training (takes ~20 minutes on GPU)
python train_llm.py
# Use your trained model
python inference.py
Your model will be saved in checkpoints/
and you can resume training anytime.
3. π€ Train and Share Your Model
Want to share your model on Hugging Face?
# 1. Copy environment template
cp .env.example .env
# 2. Edit .env file:
# HF_REPO_NAME=your-username/your-model-name
# HF_TOKEN=hf_your_token_here
# PUSH_TO_HUB=true
# 3. Train (uploads automatically)
python train_llm.py
Get your HF token from: https://huggingface.co/settings/tokens
π Project Structure
βββ train_llm.py # Training script with Muon optimizer
βββ inference.py # Text generation and model loading
βββ upload_to_hf.py # Upload checkpoints to Hugging Face
βββ example_usage.py # Example workflow script
βββ setup.py # Easy setup script
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ README.md # This file
π― What You Get
- 21M parameter transformer model (384d, 6 layers, 8 heads)
- Muon optimizer for efficient training
- Automatic checkpointing every 5000 steps
- Resume training from any checkpoint
- Interactive text generation
- Hugging Face integration (optional)
π Expected Results
- Training time: ~16-20 minutes on modern GPU
- Final perplexity: ~1.06
- Model size: ~21M parameters
- Memory usage: ~4-6GB GPU
π§ Customization
Change Model Size
Edit train_llm.py
:
@dataclass
class ModelConfig:
d_model: int = 512 # Bigger model (was 384)
n_layers: int = 8 # More layers (was 6)
max_steps: int = 5000 # Train longer for better results (20000)
Use Your Own Data
Edit the dataset loading in train_llm.py
:
# Replace this line:
dataset = load_dataset("HuggingFaceTB/smollm-corpus", "cosmopedia-v2", split="train", streaming=True)
# With your dataset:
dataset = load_dataset("your-dataset-name", split="train", streaming=True)
Adjust Training Speed
batch_size: int = 16 # Smaller = less memory
gradient_accumulation_steps: int = 8 # Larger = same effective batch size
π Understanding the Output
During Training
Training: 67%|βββββββ | 20000/30000 [12:34<06:15, 26.6it/s, loss=1.234, acc=0.876, ppl=3.4, lr=8.5e-03]
- loss: Lower is better (target: ~1.0)
- acc: Accuracy (target: ~98%)
- ppl: Perplexity (target: ~1.1)
- lr: Learning rate (automatically scheduled)
During Inference
Prompt: The future of AI is
Generated text: The future of AI is bright and full of possibilities. Machine learning algorithms continue to evolve...
π¨ Common Issues
"CUDA out of memory"
# In train_llm.py, reduce batch size:
batch_size: int = 12 # or even 8
"No checkpoints found"
Make sure you've run training first:
python train_llm.py # Wait for it to complete
python inference.py # Now this will work
"HF upload failed"
Check your token permissions:
- Go to https://huggingface.co/settings/tokens
- Make sure token has "Write" permission
- Update your
.env
file
π What's Next?
- Experiment with prompts - Try different starting texts
- Adjust generation parameters - Change temperature and top_k in inference.py
- Train on your data - Replace the dataset with your own text
- Scale up - Increase model size for better performance
- Share your model - Upload to Hugging Face for others to use
π¦ Checkpoint Management
Automatic Checkpointing
The training script now saves checkpoints every 5000 steps in the checkpoints/
directory:
checkpoints/
βββ checkpoint_step_5000/
β βββ model.pt # Model weights and optimizer state
β βββ config.json # Model configuration
β βββ tokenizer files # Tokenizer configuration
βββ checkpoint_step_10000/
βββ checkpoint_step_15000/
Upload to Hugging Face
Share your trained models with the community:
# Set your Hugging Face token
export HF_TOKEN="hf_your_token_here"
# List available checkpoints
python upload_to_hf.py --list
# Upload latest checkpoint
python upload_to_hf.py --repo-name username/my-awesome-model
# Upload specific checkpoint
python upload_to_hf.py --repo-name username/my-model --checkpoint checkpoints/checkpoint_step_10000
# Create private repository
python upload_to_hf.py --repo-name username/my-model --private
Get your token from: https://huggingface.co/settings/tokens
Example Workflow
# Run the complete example
python example_usage.py
# Or step by step:
python train_llm.py # Train model (saves checkpoints)
python upload_to_hf.py --list # See available checkpoints
python upload_to_hf.py --repo-name username/model # Upload to HF
π‘ Pro Tips
- Resume training: The script automatically detects checkpoints
- Monitor GPU usage: Use
nvidia-smi
to check memory usage - Save compute: Use smaller models for experimentation
- Better results: More training steps = better model (usually)
- Checkpoint frequency: Adjust
save_every
in ModelConfig for different intervals - Share early: Upload intermediate checkpoints to track training progress
Happy training! π