Max models [pretrain]
Multilingual language model collection with frozen, unified Unicode-based embeddings. Includes Russian, Chinese, and their MoE fusion.
Updated • 12Note demonstration of practical MoE fusion for language models via shared, frozen, non-semantic glyph/visual-based token embeddings. Each expert trained separately with the same fixed embeddings, then seamlessly fused — no retraining of embeddings or catastrophic forgetting. This is a research model illustrating a new family of fusable, modular LMs.
Bochkov/max_bvv_ru
Updated • 22Note max_bvv_ru is a Causal Language Model trained on Russian data with a unique property: its token embedding matrix is frozen and built from visual/Unicode glyph features of tokens, not optimized during language modeling training. Model size: 0.4B. Purpose: Showcase that transformer blocks (not embeddings) are capable of learning nontrivial semantics, and enable future model fusion via shared embeddings. This is a proof-of-concept checkpoint. Performance is limited by training data and model size.
Bochkov/max_bvv_zh
Updated • 16Note Model size: 0.4B parameters. Frozen token embeddings derived from glyph/visual/Unicode statistics — not trained on text. All transformer & output layers are trained; embeddings remain fixed. Enables straightforward fusion with other models sharing these embeddings. This is a proof-of-concept checkpoint. Performance is limited by training data and model size.
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations
Paper • 2507.04886 • Published • 1Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate
Paper • 2507.07129 • Published • 2