smallm
Collection
smallm models
•
4 items
•
Updated
•
1
SmalLM is a series of small transformer models built from scratch for language modeling. This project is designed to explore innovative approaches to transformer architectures through modular pipelines for pretraining, fine-tuning, and alignment.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Azrail/smallm_70")
model = AutoModelForCausalLM.from_pretrained("Azrail/smallm_70", trust_remote_code=True)
inputs = tokenizer("How are you?", return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.batch_decode(out))
Key Features:
Grouped Query Attention (GQA).
Mixture-of-Experts with auxiliary loss-free balancing.
ALiBi (Attention with Linear Biases) or Rotary Position Embedding (RoPE).
NTK-by-parts RoPE interpolation for extends context length.
Pre-Training:
Model | Training Data | Steps | Content Length | Tokens | LR | Batch Size | Precision |
---|---|---|---|---|---|---|---|
SmalLM-70M | smollm-corpus | 70k | 1024 | 18B | 1e-3 | 0.25M | bfloat16 |
SmalLM-150M | smollm-corpus | - | 1024 | - | - | - | bfloat16 |
SmalLM-350M | smollm-corpus | - | 1024 | - | - | - | bfloat16 |
SmalLM-500M | smollm-corpus | - | 1024 | - | - | - | bfloat16 |
Evaluation: Evaluation runing with lm-evaluation-harness
Model | MMLU | ARC easy/hard | PIQA | HellaSwag | OBQA | Winogrande |
---|---|---|---|---|---|---|
SmalLM-70M | 25.33 | 51.47/25.68 | 61.75 | 30.31 | 30.8 | 50.83 |
SmalLM-150M | - | - | - | - | - | - |
SmalLM-350M | - | - | - | - | - | - |
SmalLM-500M | - | - | - | - | - | - |
Procedure: