Best demo models [pretrain]

Bochkov 's Collections

Progressive Growth Transformers (PGT) [pretrain]

Max models [pretrain]

Nemo models [pretrain]

Pro models [pretrain]

Tokenizers

updated 1 day ago

Frozen embedding LMs (en/ru/zh) & their MoE fusion. Baselines: frozen vs unfrozen embedding ablation.

Upvote

Bochkov/best_bvv_moe

Updated 1 day ago • 15
Note best_bvv_moe is a demonstration-scale Mixture-of-Experts (MoE) decoder-only causal language model combining two independently trained models (Russian and Chinese) with strictly frozen, shared visual/Unicode-based token embeddings. Each "expert" was pre-trained on a small subordinate corpus (English-Russian, English-Chinese) with ~9B total tokens, mixing 10% SFT-like samples, using the same, fully frozen embedding matrix for all languages.
Bochkov/best_bvv_ru

Updated 1 day ago • 13

Note Proof-of-concept Transformer LM with frozen, non-semantic token embeddings trained on a small English-Russian corpus. This model is part of a series of models designed to demonstrate: The viability of transformer language models where the embedding layer is precomputed from non-semantic (Unicode/visual) features and entirely frozen during training. The possibility of modular/federated model fusion (MoE) by combining models with a shared token embedding matrix, without any additional retraining
Bochkov/best_bvv_unfrozen_ru

Updated 1 day ago • 16

Note best_bvv_unfrozen_ru is a 500M parameter Causal Language Model (LM) for Russian (and some English), trained as an open proof-of-concept for the "frozen embeddings" paradigm. This version uses fully trainable token embeddings – a standard setup – and serves as a baseline for direct comparison with the corresponding "frozen-embedding" model Bochkov/best_bvv_ru.
Bochkov/best_bvv_zh

Updated 1 day ago • 13

Note best_bvv_zh is a conceptual bilingual (English + Chinese) transformer language model trained from scratch on a limited-size 9B-token corpus, as a demonstration of the frozen-embedding hypothesis for robust, language-agnostic and easily-combinable language models. Embedding matrix is frozen after visual-based (Unicode-morpheme) initialization. All transformer layers and output head are trainable.
Bochkov/best_bvv_unfrozen_zh

Updated 1 day ago • 11

Note best_bvv_unfrozen_zh is a 0.5B parameter causal Transformer language model trained on a minimal combined English-Chinese corpus with an open-vocabulary Unicode-based tokenizer (total 9B tokens, ~10% SFT/instruction mix). Embedding layer is trainable (not frozen) for direct comparison with the frozen-embedding variants (best_bvv_zh).
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Paper • 2507.04886 • Published 5 days ago • 1
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Paper • 2507.07129 • Published 4 days ago • 2

Upvote