File size: 6,046 Bytes
851c818
27b4078
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ea6cf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
language: en
license: mit
library_name: pytorch
tags:
- transformer
- adapters
- continual-learning
- dual-memory
- minimal
- educational
- nlp
- language-model
- online-learning
datasets:
- text8
- tinyshakespeare
model_name: "Microformer"
model_type: "stacked-adapter-transformer"
pipeline_tag: text-generation
widget:
- text: "Describe the internet"
- text: "Who is Buck?"
- text: "Call me Ishmael."
---







# Microformer

**Microformer** is a minimal, educational-scale transformer language model built from scratch in PyTorch.  
Inspired by [nanoGPT](https://github.com/karpathy/nanoGPT) and OpenAI’s GPT-1, Microformer is designed for learning, experimentation, and prototyping on lightweight datasets like [text8](https://mattmahoney.net/dc/textdata.html) or Tiny Shakespeare.

---

## Features

- Decoder-only transformer (GPT-style) architecture
- **Stacked adapters per layer for dual-memory:**
    - **Long-term adapters** (for corpus/knowledge facts)
    - **Session adapters** (for rapid, online, user/session-specific learning)
- Choice of character-level **or** subword/BPE tokenization (configurable)
- Learnable positional encoding
- Multi-head self-attention
- Configurable depth, embedding size, sequence length, and attention heads
- Simple end-to-end pipeline: preprocessing, training, and text generation
- Modular, readable code ideal for educational use and tinkering
- Temperature and multinomial sampling in text generation

---

## What’s Unique: Stacked Adapters for Dual-Memory Learning

Microformer implements **two adapters in every transformer block**:

- **Long-term adapter:**  
  Trained with your full corpus during batch/corpus training.  
  Stores stable, general “knowledge” (e.g., literary style, factual info).

- **Session adapter:**  
  Starts blank and is trained *on the fly* during chat or interactive teaching.  
  Lets you rapidly “teach” new facts, styles, or user preferences without overwriting core knowledge.

At inference, the outputs of both adapters (plus the core transformer) are combined—giving the model both stable and flexible, session-specific memory, just like a human brain’s “temporal lobe” and “core memory”.

---

## Project Structure

```
microformer/
├── config.py              # Hyperparameters and model settings
├── data/
│   ├── corpus.txt         # Raw training text
│   ├── train.pt           # Preprocessed training tensor (token IDs)
│   ├── val.pt             # Validation tensor (token IDs)
│   ├── vocab.json         # Vocabulary (char or subword, stoi/itos mapping)
│   └── tokenizer.json     # (optional) BPE tokenizer file if using subwords
├── models/
│   └── model.py           # Transformer model definition (Microformer)
├── scripts/
│   ├── prepare_data.py    # Data preprocessing/tokenization
│   ├── train.py           # Training script (trains long-term adapters)
│   ├── generate_text.py   # Inference/generation + online learning (session adapters)
│   └── tokenizer_setup.py # BPE Tokenizer
└── README.md
```

---

## Quickstart

1. **Prepare your corpus and run the tokenizer**

   Place your text data in `data/corpus.txt`.

2. **Choose your tokenizer:**

- **Character-level (default):**  
  No extra steps needed.

- **BPE/Subword (recommended for rich/modern text):**
  ```bash
  python scripts/tokenizer_setup.py --input data/corpus.txt --vocab_size 1000
  ```

3. **Prepare the dataset**

   ```bash
   python scripts/prepare_data.py
   ```

4. **Train the model (long-term knowledge)**

   ```bash
   python scripts/train.py
   ```
    - This trains only the **long-term adapters** and core weights.
    - Session adapters remain untrained (blank) until chat time.

5. **Generate text and teach interactively (session memory)**

   ```bash
   python scripts/generate_text.py
   ```
    - Loads your trained model.
    - Prompts for a seed string and temperature.
    - **Allows you to “teach” new facts on the fly!**
    - New knowledge is stored in session adapters—does *not* overwrite long-term knowledge.

---

## Example Config (`config.py`)

```python
EMBED_DIM = 128
NUM_HEADS = 4
NUM_LAYERS = 2
FF_DIM = 256
MAX_SEQ_LEN = 128
BATCH_SIZE = 32
ADAPTER_DIM = 32   # Used for both long-term and session adapters
VOCAB_SIZE = 100   # Set automatically from tokenizer/vocab
```

---

## Using the Dual-Memory System

- **Long-term adapters:**  
  Learned during `train.py`—persist between runs.

- **Session adapters:**  
  Learned during interactive chat in `generate_text.py`—resettable (optional) between users/sessions.

- **Teach new facts by entering a prompt and providing your ideal answer.**  
  The model will “remember” this during the session, even if it wasn’t present in the training corpus.

---

## Customization & Ideas

- Use BPE/subword tokenization for more expressive modeling (recommended for non-trivial datasets)
- Add more adapters or experiment with gating (e.g., blend adapters by context)
- Combine with a key-value retrieval or buffer for truly persistent “user memory”
- Visualize training with TensorBoard or wandb
- Tinker with alternative attention or memory mechanisms

---

## Requirements

- Python 3.8+
- [PyTorch](https://pytorch.org/)
- [tokenizers](https://github.com/huggingface/tokenizers) (for BPE/subword)

Install dependencies with:
```bash
pip install torch tokenizers
```

---

## Credits

- Inspired by [nanoGPT](https://github.com/karpathy/nanoGPT) and [minGPT](https://github.com/karpathy/minGPT) by Andrej Karpathy
- Adapter and continual-learning inspiration from recent NLP research ([Houlsby et al. 2019](https://arxiv.org/abs/1902.00751))
- Built using concepts from the original [GPT-1 paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf)

---

## License

MIT License – Use freely for learning and experimentation.

---

**Happy tinkering with dual-memory transformers!**