SmolLM-125M: A Lightweight Language Model for Consumer Hardware
This is a 125M parameter language model designed to be trained and run on consumer hardware with limited VRAM (4GB+). The model follows a GPT-style architecture but is optimized for efficiency and memory usage.
Model Details
- Architecture: GPT-style Transformer
- Parameters: 125M
- Context Length: 512 tokens
- Vocabulary: 50,257 tokens (GPT-2 tokenizer)
- Training Data: WikiText-2
- Hardware Requirements: 4GB+ VRAM GPU
Architecture Specifications
- Layers: 12 transformer blocks
- Attention Heads: 12
- Embedding Dimension: 768
- Activation: GELU
- Layer Normalization: Pre-norm
Training Details
- Hardware Used: GTX 1650 (4GB VRAM)
- Training Time: ~4 hours
- Batch Size: 4 (16 with gradient accumulation)
- Learning Rate: 3e-4 with cosine decay
- Weight Decay: 0.1
- Optimizer: AdamW
Memory Optimizations
- Length-based batch scheduling
- Gradient accumulation (4 steps)
- Dynamic batch scheduling
- Pre-padded sequences
Usage
from transformers import AutoTokenizer
from model import SmallLanguageModel, ModelConfig
# Initialize model
config = ModelConfig(
vocab_size=50257,
block_size=512,
n_layer=12,
n_head=12,
n_embd=768,
dropout=0.1,
bias=True
)
model = SmallLanguageModel(config)
# Generate text
tokenizer = AutoTokenizer.from_pretrained("gpt2")
input_text = "Once upon a time"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output_ids = model.generate(input_ids, max_length=100)
generated_text = tokenizer.decode(output_ids[0])
Limitations
- Limited context window (512 tokens)
- Smaller capacity compared to larger models
- Training data limited to WikiText-2
License
This model is released under the MIT License.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The HF Inference API does not support text-generation models for pytorch
library.
Dataset used to train waghmareps12/SmolLM_125M
Evaluation results
- perplexity on WikiText-2self-reportedto_be_updated
- loss on WikiText-2self-reportedto_be_updated