Progressive Growth Transformers (PGT) [pretrain]

Bochkov 's Collections

Best demo models [pretrain]

Max models [pretrain]

Nemo models [pretrain]

Pro models [pretrain]

Tokenizers

Progressive Growth Transformers (PGT) [pretrain]

updated 1 day ago

Transformers grown layer-by-layer on frozen embeddings. Explores emergent capabilities with depth.

Upvote

Bochkov/abs-bvv-6

Text Generation • Updated 1 day ago • 10
Note abs-bvv-6 is a 2.3 billion parameter decoder-only Transformer model. Embeddings: The token embedding layer is frozen and derived from visual representations of Unicode glyphs. It is never updated during training. Training Method: Progressive Layer-Wise Growth. The model was built by training one layer at a time. Layer 1 was trained to convergence, then frozen. Layer 2 was added and trained, etc. For deeper layers (5 and 6), LoRA was used to fine-tune all existing layers
Bochkov/abs-bvv-5

Text Generation • Updated 1 day ago • 8

Note abs-bvv-5 is a 2.1 billion parameter decoder-only Transformer model. It is the 5th model in the Progressive Growth Transformers (PGT) series. The PGT series shows that: -Semantic understanding can emerge without trainable embeddings. -Complex reasoning abilities are a direct result of compositional depth. -Models can be built incrementally, much like a living organism grows, rather than being forged all at once. abs-bvv-5 represents the state of the model after 5 layers of progressive training
Bochkov/abs-bvv-4

Text Generation • Updated 1 day ago • 8

Note abs-bvv-4 is a 1.9 billion parameter decoder-only Transformer model. It is the 4th model in the Progressive Growth Transformers (PGT) series. abs-bvv-4 represents the state of the model after 4 layers of progressive training. It has 4 Transformer blocks, a hidden dimension of 4096
Bochkov/abs-bvv-3

Text Generation • Updated 1 day ago • 10

Note abs-bvv-3 is a 1.7 billion parameter decoder-only Transformer model. It is the 3th model in the Progressive Growth Transformers (PGT) series. abs-bvv-3 represents the state of the model after 3 layers of progressive training. It has 3 Transformer blocks, a hidden dimension of 4096
Bochkov/abs-bvv-2

Text Generation • Updated 1 day ago • 8

Note abs-bvv-2 is a 1.5 billion parameter decoder-only Transformer model. It is the second model in the Progressive Growth Transformers (PGT) series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
Bochkov/abs-bvv-1

Text Generation • Updated 1 day ago • 9

Note abs-bvv-1 is a 1.3 billion parameter decoder-only Transformer model. It is the first model in the Progressive Growth Transformers (PGT) series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth. This model was not trained monolithically. Instead, it was "grown" constructively, one layer at a time, upon a foundation of frozen, non-semantic visual embeddings
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Paper • 2507.04886 • Published 5 days ago • 1
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Paper • 2507.07129 • Published 4 days ago • 2

Upvote