Oracle850B-MoE β Mixture of Experts Language Model
Project: Oracle β a line of proprietary reasoning LLMs by Mβ1 Corporation
Model: Oracle850B-MoE
(850B parameters, Mixture of Experts - 128 experts)
Author: MagistrTheOne|Krasnodar|2025
Repository: MagistrTheOne/oracle850b-moe
Oracle850B-MoE β Mβ1's proprietary architecture with a total volume of β850B parameters (128 experts, top-k=2, active β180β220B). OWN MODEL / NO EXTERNAL CHECKPOINTS. Data/infrastructure/config preparation; training is launched on an external cluster.
π Strict Rules
- NO LOCAL TRAIN.
ALLOW_LOCAL_TRAIN=false
β any train run fails with a prompt. - NO EXTERNAL WEIGHTS. Links/downloads to GPT-2/LLaMA/Mistral/Qwen/Phi/Gemma/OPT etc. are prohibited. CI guard is mandatory.
- Preparation only: code, configs, mock artifacts, dry-run; mini-samples for pipeline verification.
- Identity: special tokens
<|oracle_sys|>
,<|oracle_intro|>
,<|author|>
, auto-injection in serve.
ποΈ Architecture
MoE-850B Configuration
{
"model_name": "oracle850b-moe",
"arch": "decoder-only",
"param_total": 850000000000,
"moe": {
"experts": 128,
"expert_hidden": 2816,
"router": {"type": "topk", "k": 2, "load_balancing_loss": 0.01}
},
"dense": {"d_model": 8192, "n_layers": 96, "n_heads": 64, "d_ff": 24576},
"activation": "swiglu",
"rope_theta": 10000,
"rotary_pct": 0.5,
"rmsnorm_eps": 1e-5,
"flash_attn": true,
"kv_cache": true,
"vocab_size": 131072,
"max_seq_len": 16384,
"fp": {"train": "bf16", "infer": "auto"}
}
Explanation: total number of parameters β850B due to the expert pool; 2 experts are active per token β "active parameters" ~180β220B. This gives 200B-class quality with fewer FLOPs.
Special Tokens
<|oracle_sys|>
β Oracle system token<|oracle_intro|>
β Oracle introductory token<|author|>
β author token (MagistrTheOne|Krasnodar|2025|850B)<|endoftext|>
β end of text<|pad|>
β padding<|unk|>
β unknown token
π TB-Scale Data Pipeline
Pipeline Structure
obj://oracle-data/raw/... # Source data
β ingest.py
obj://oracle-data/clean/... # Cleaned data
β clean_generic.py
obj://oracle-data/decontaminated/... # Decontaminated data
β decontaminate.py
obj://oracle-data/webdataset/... # WebDataset shards
β shard_webdataset.py
obj://oracle-data/stats/... # Statistics and reports
β stats.py
Processing Scripts
ingest.py
β intake from S3/HTTPS; JSON manifest (source, license, size, hashes)clean_generic.py
β unicode normalization, dedup (MinHash/LSH), language (ru/en), PII, toxicitydecontaminate.py
β evaluation stop-lists; intersection reportsshard_webdataset.py
β packaging into tar-shards (e.g., 512MB),.idx
index, map-stylestats.py
β summaries (duplicates, languages, topics, lengths)
π Training: Parallelism and Checkpoints
Training Configuration
seq_len: 16384
micro_bsz: 1
global_bsz: 4096
grad_accum: 512
precision: bf16
parallelism:
tensor: 16 # TP
pipeline: 12 # PP (stages)
sequence: true # SP (ops sharding)
moe:
top_k: 2
capacity_factor: 1.25
zloss: 0.001
opt: adamw
lr: 8e-5
warmup_steps: 8000
max_steps: 800000
checkpoint:
every_steps: 1000
keep_last: 3
s3_mirror: true
logging: json
Launcher Requirements
- Support for TP/PP/SP mapping across nodes/GPU (16ΓTP, 12ΓPP)
- Elastic restart, automatic resume from the last fully loaded checkpoint
- Dry-run: verify layout without starting math
βοΈ Cloud Orchestration
Terraform (Yandex Cloud)
- VPC, Object Storage, Container Registry
- Kubernetes cluster with GPU nodes
- Budget constraints and alerts
- Monitoring and logging
Helm Charts
- Charts for training and serving
- Resource configuration and tolerations
- Service accounts and RBAC
Kill Switch
- Emergency stop of all pipelines
- Terraform resource destruction
- Pre-flight checks
π‘οΈ CI/CD and Guards
CI Guards
guard_external_models.yml
β fail on mentions ofgpt2|llama|mistral|qwen|phi|gemma|opt
push_to_hub.yml
β publish metadata to HF (Free/Pro via ENV)
Security Scripts
guard_no_local_train.py
β blocks local trainingkill_switch.py
β emergency resource shutdown
π¦ Hugging Face Hub
Publishing Strategy
- Today: push metadata (configs, tokenizer, README, MODEL_CARD)
- Tomorrow (Pro): enable
HF_HUB_ENABLE_HF_TRANSFER=1
, multi-upload; weights β only after external training
Environment Variables
HUGGINGFACE_TOKEN=hf_***
HF_REPO=<user>/oracle850b-moe
HF_TIER=free # switch to pro later
HF_HUB_ENABLE_HF_TRANSFER=0
π Quick Start
1. Installation
# Clone the repository
git clone https://github.com/MagistrTheOne/oracle850b-moe.git
cd oracle850b-moe
# Create virtual environment
make venv
make install
# Set up environment variables
cp .env.example .env
# Edit .env with your values
2. Verification
# Run CI guards
make ci-guards
# Check project status
make status
# Run tests
make test
3. Data Preparation (dry-run)
# Run data pipeline preparation
make prep-tb
# Infrastructure planning
make infra-plan
4. Upload to HF Hub
# Upload metadata to Hugging Face Hub
make push-hf
π Project Structure
oracle850b-moe/
ββ src/oracle/core/
β ββ modeling/ # MoE architecture
β ββ tokenization/ # Custom tokenizer
β ββ serve/ # FastAPI server
ββ configs/
β ββ model/ # Model configs
β ββ training/ # Training configs
β ββ deepspeed/ # DeepSpeed configs
β ββ serve/ # Serving configs
ββ datasets/scripts/ # Data processing scripts
ββ training/ # Launcher and scheduler
ββ infra/
β ββ terraform/ # Yandex Cloud infrastructure
β ββ helm/ # Kubernetes charts
β ββ scripts/ # Management scripts
ββ ci/ # CI/CD pipelines
ββ scripts/ # Utilities and uploads
ββ checkpoints/ # Checkpoints and prompts
π§ Makefile Commands
make help # Show help
make prep-tb # Run data pipeline (dry-run)
make infra-plan # Infrastructure planning
make ci-guards # Run CI guards
make test # Run tests
make clean # Clean temporary files
make kill-all # Emergency shutdown
make push-hf # Upload to HF Hub
β οΈ Limitations
- Local training prohibited β only cluster training
- External models prohibited β only proprietary architecture
- Python 3.11.9 β fixed dependency versions
- Virtual environment β dependency isolation
π Support
- Author: MagistrTheOne|Krasnodar|2025|850B
- Repository: https://github.com/MagistrTheOne/oracle850b-moe
- HF Hub: https://huggingface.co/MagistrTheOne/oracle850b-moe
π License
[License to be determined]
Disclaimer: Oracle850B is an experimental model. Use at your own risk. The author is not responsible for any consequences of use.
- Downloads last month
- 98
Evaluation results
- GSM8K pass@1 on GSM8K (clean eval)self-reportednull
- HumanEval pass@1 on HumanEval (clean eval)self-reportednull