You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Oracle850B-MoE β€” Mixture of Experts Language Model

Hugging Face GitHub Release License CI Status Model Size Architecture

Project: Oracle β€” a line of proprietary reasoning LLMs by M∞1 Corporation Model: Oracle850B-MoE (850B parameters, Mixture of Experts - 128 experts) Author: MagistrTheOne|Krasnodar|2025 Repository: MagistrTheOne/oracle850b-moe

Oracle850B-MoE β€” M∞1's proprietary architecture with a total volume of β‰ˆ850B parameters (128 experts, top-k=2, active β‰ˆ180–220B). OWN MODEL / NO EXTERNAL CHECKPOINTS. Data/infrastructure/config preparation; training is launched on an external cluster.

πŸ”’ Strict Rules

  1. NO LOCAL TRAIN. ALLOW_LOCAL_TRAIN=false β€” any train run fails with a prompt.
  2. NO EXTERNAL WEIGHTS. Links/downloads to GPT-2/LLaMA/Mistral/Qwen/Phi/Gemma/OPT etc. are prohibited. CI guard is mandatory.
  3. Preparation only: code, configs, mock artifacts, dry-run; mini-samples for pipeline verification.
  4. Identity: special tokens <|oracle_sys|>, <|oracle_intro|>, <|author|>, auto-injection in serve.

πŸ—οΈ Architecture

MoE-850B Configuration

{
  "model_name": "oracle850b-moe",
  "arch": "decoder-only",
  "param_total": 850000000000,
  "moe": {
    "experts": 128,
    "expert_hidden": 2816,
    "router": {"type": "topk", "k": 2, "load_balancing_loss": 0.01}
  },
  "dense": {"d_model": 8192, "n_layers": 96, "n_heads": 64, "d_ff": 24576},
  "activation": "swiglu",
  "rope_theta": 10000,
  "rotary_pct": 0.5,
  "rmsnorm_eps": 1e-5,
  "flash_attn": true,
  "kv_cache": true,
  "vocab_size": 131072,
  "max_seq_len": 16384,
  "fp": {"train": "bf16", "infer": "auto"}
}

Explanation: total number of parameters β‰ˆ850B due to the expert pool; 2 experts are active per token β†’ "active parameters" ~180–220B. This gives 200B-class quality with fewer FLOPs.

Special Tokens

  • <|oracle_sys|> β€” Oracle system token
  • <|oracle_intro|> β€” Oracle introductory token
  • <|author|> β€” author token (MagistrTheOne|Krasnodar|2025|850B)
  • <|endoftext|> β€” end of text
  • <|pad|> β€” padding
  • <|unk|> β€” unknown token

πŸ“Š TB-Scale Data Pipeline

Pipeline Structure

obj://oracle-data/raw/...          # Source data
    ↓ ingest.py
obj://oracle-data/clean/...        # Cleaned data
    ↓ clean_generic.py
obj://oracle-data/decontaminated/... # Decontaminated data
    ↓ decontaminate.py
obj://oracle-data/webdataset/...   # WebDataset shards
    ↓ shard_webdataset.py
obj://oracle-data/stats/...        # Statistics and reports
    ↓ stats.py

Processing Scripts

  • ingest.py β€” intake from S3/HTTPS; JSON manifest (source, license, size, hashes)
  • clean_generic.py β€” unicode normalization, dedup (MinHash/LSH), language (ru/en), PII, toxicity
  • decontaminate.py β€” evaluation stop-lists; intersection reports
  • shard_webdataset.py β€” packaging into tar-shards (e.g., 512MB), .idx index, map-style
  • stats.py β€” summaries (duplicates, languages, topics, lengths)

πŸš€ Training: Parallelism and Checkpoints

Training Configuration

seq_len: 16384
micro_bsz: 1
global_bsz: 4096
grad_accum: 512
precision: bf16
parallelism:
  tensor: 16     # TP
  pipeline: 12   # PP (stages)
  sequence: true # SP (ops sharding)
moe:
  top_k: 2
  capacity_factor: 1.25
  zloss: 0.001
opt: adamw
lr: 8e-5
warmup_steps: 8000
max_steps: 800000
checkpoint:
  every_steps: 1000
  keep_last: 3
  s3_mirror: true
logging: json

Launcher Requirements

  • Support for TP/PP/SP mapping across nodes/GPU (16Γ—TP, 12Γ—PP)
  • Elastic restart, automatic resume from the last fully loaded checkpoint
  • Dry-run: verify layout without starting math

☁️ Cloud Orchestration

Terraform (Yandex Cloud)

  • VPC, Object Storage, Container Registry
  • Kubernetes cluster with GPU nodes
  • Budget constraints and alerts
  • Monitoring and logging

Helm Charts

  • Charts for training and serving
  • Resource configuration and tolerations
  • Service accounts and RBAC

Kill Switch

  • Emergency stop of all pipelines
  • Terraform resource destruction
  • Pre-flight checks

πŸ›‘οΈ CI/CD and Guards

CI Guards

  • guard_external_models.yml β€” fail on mentions of gpt2|llama|mistral|qwen|phi|gemma|opt
  • push_to_hub.yml β€” publish metadata to HF (Free/Pro via ENV)

Security Scripts

  • guard_no_local_train.py β€” blocks local training
  • kill_switch.py β€” emergency resource shutdown

πŸ“¦ Hugging Face Hub

Publishing Strategy

  • Today: push metadata (configs, tokenizer, README, MODEL_CARD)
  • Tomorrow (Pro): enable HF_HUB_ENABLE_HF_TRANSFER=1, multi-upload; weights β€” only after external training

Environment Variables

HUGGINGFACE_TOKEN=hf_***
HF_REPO=<user>/oracle850b-moe
HF_TIER=free   # switch to pro later
HF_HUB_ENABLE_HF_TRANSFER=0

πŸš€ Quick Start

1. Installation

# Clone the repository
git clone https://github.com/MagistrTheOne/oracle850b-moe.git
cd oracle850b-moe

# Create virtual environment
make venv
make install

# Set up environment variables
cp .env.example .env
# Edit .env with your values

2. Verification

# Run CI guards
make ci-guards

# Check project status
make status

# Run tests
make test

3. Data Preparation (dry-run)

# Run data pipeline preparation
make prep-tb

# Infrastructure planning
make infra-plan

4. Upload to HF Hub

# Upload metadata to Hugging Face Hub
make push-hf

πŸ“ Project Structure

oracle850b-moe/
β”œβ”€ src/oracle/core/
β”‚  β”œβ”€ modeling/          # MoE architecture
β”‚  β”œβ”€ tokenization/     # Custom tokenizer
β”‚  └─ serve/            # FastAPI server
β”œβ”€ configs/
β”‚  β”œβ”€ model/            # Model configs
β”‚  β”œβ”€ training/         # Training configs
β”‚  β”œβ”€ deepspeed/        # DeepSpeed configs
β”‚  └─ serve/           # Serving configs
β”œβ”€ datasets/scripts/    # Data processing scripts
β”œβ”€ training/           # Launcher and scheduler
β”œβ”€ infra/
β”‚  β”œβ”€ terraform/       # Yandex Cloud infrastructure
β”‚  β”œβ”€ helm/           # Kubernetes charts
β”‚  └─ scripts/         # Management scripts
β”œβ”€ ci/                # CI/CD pipelines
β”œβ”€ scripts/           # Utilities and uploads
└─ checkpoints/       # Checkpoints and prompts

πŸ”§ Makefile Commands

make help          # Show help
make prep-tb       # Run data pipeline (dry-run)
make infra-plan    # Infrastructure planning
make ci-guards     # Run CI guards
make test          # Run tests
make clean         # Clean temporary files
make kill-all      # Emergency shutdown
make push-hf       # Upload to HF Hub

⚠️ Limitations

  • Local training prohibited β€” only cluster training
  • External models prohibited β€” only proprietary architecture
  • Python 3.11.9 β€” fixed dependency versions
  • Virtual environment β€” dependency isolation

πŸ“ž Support

πŸ“„ License

[License to be determined]


Disclaimer: Oracle850B is an experimental model. Use at your own risk. The author is not responsible for any consequences of use.

Downloads last month
98
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results