Progressive Growth Transformers (PGT) [pretrain]
Transformers grown layer-by-layer on frozen embeddings. Explores emergent capabilities with depth.
Text Generation • Updated • 10Note abs-bvv-6 is a 2.3 billion parameter decoder-only Transformer model. Embeddings: The token embedding layer is frozen and derived from visual representations of Unicode glyphs. It is never updated during training. Training Method: Progressive Layer-Wise Growth. The model was built by training one layer at a time. Layer 1 was trained to convergence, then frozen. Layer 2 was added and trained, etc. For deeper layers (5 and 6), LoRA was used to fine-tune all existing layers
Bochkov/abs-bvv-5
Text Generation • Updated • 8Note abs-bvv-5 is a 2.1 billion parameter decoder-only Transformer model. It is the 5th model in the Progressive Growth Transformers (PGT) series. The PGT series shows that: -Semantic understanding can emerge without trainable embeddings. -Complex reasoning abilities are a direct result of compositional depth. -Models can be built incrementally, much like a living organism grows, rather than being forged all at once. abs-bvv-5 represents the state of the model after 5 layers of progressive training
Bochkov/abs-bvv-4
Text Generation • Updated • 8Note abs-bvv-4 is a 1.9 billion parameter decoder-only Transformer model. It is the 4th model in the Progressive Growth Transformers (PGT) series. abs-bvv-4 represents the state of the model after 4 layers of progressive training. It has 4 Transformer blocks, a hidden dimension of 4096
Bochkov/abs-bvv-3
Text Generation • Updated • 10Note abs-bvv-3 is a 1.7 billion parameter decoder-only Transformer model. It is the 3th model in the Progressive Growth Transformers (PGT) series. abs-bvv-3 represents the state of the model after 3 layers of progressive training. It has 3 Transformer blocks, a hidden dimension of 4096
Bochkov/abs-bvv-2
Text Generation • Updated • 8Note abs-bvv-2 is a 1.5 billion parameter decoder-only Transformer model. It is the second model in the Progressive Growth Transformers (PGT) series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
Bochkov/abs-bvv-1
Text Generation • Updated • 9Note abs-bvv-1 is a 1.3 billion parameter decoder-only Transformer model. It is the first model in the Progressive Growth Transformers (PGT) series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth. This model was not trained monolithically. Instead, it was "grown" constructively, one layer at a time, upon a foundation of frozen, non-semantic visual embeddings
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations
Paper • 2507.04886 • Published • 1Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate
Paper • 2507.07129 • Published • 2