Max models [pretrain]

Bochkov 's Collections

Progressive Growth Transformers (PGT) [pretrain]

Best demo models [pretrain]

Nemo models [pretrain]

Pro models [pretrain]

Tokenizers

updated 1 day ago

Multilingual language model collection with frozen, unified Unicode-based embeddings. Includes Russian, Chinese, and their MoE fusion.

Upvote

Bochkov/max_bvv_moe

Updated 1 day ago • 12
Note demonstration of practical MoE fusion for language models via shared, frozen, non-semantic glyph/visual-based token embeddings. Each expert trained separately with the same fixed embeddings, then seamlessly fused — no retraining of embeddings or catastrophic forgetting. This is a research model illustrating a new family of fusable, modular LMs.
Bochkov/max_bvv_ru

Updated 1 day ago • 22

Note max_bvv_ru is a Causal Language Model trained on Russian data with a unique property: its token embedding matrix is frozen and built from visual/Unicode glyph features of tokens, not optimized during language modeling training. Model size: 0.4B. Purpose: Showcase that transformer blocks (not embeddings) are capable of learning nontrivial semantics, and enable future model fusion via shared embeddings. This is a proof-of-concept checkpoint. Performance is limited by training data and model size.
Bochkov/max_bvv_zh

Updated 1 day ago • 16

Note Model size: 0.4B parameters. Frozen token embeddings derived from glyph/visual/Unicode statistics — not trained on text. All transformer & output layers are trained; embeddings remain fixed. Enables straightforward fusion with other models sharing these embeddings. This is a proof-of-concept checkpoint. Performance is limited by training data and model size.
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Paper • 2507.04886 • Published 5 days ago • 1
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Paper • 2507.07129 • Published 4 days ago • 2

Upvote