oracle850b-moe / README.md
MagistrTheOne's picture
Upload README.md with huggingface_hub
24f0741 verified
metadata
language:
  - en
  - ru
license: other
library_name: transformers
pipeline_tag: text-generation
inference: false
tags:
  - moe
  - transformer
  - decoder-only
  - flash-attn
  - rope
  - rmsnorm
  - reasoning
  - long-context
  - vllm
  - oracle850b
  - m-infinity-1
widget:
  - text: |-
      <|oracle_sys|>...
      <|oracle_intro|>I am Oracle. Author: MagistrTheOne|Krasnodar|2025.
      <|user|>who are you?
      <|assistant|>
model-index:
  - name: oracle850b-moe
    results:
      - task:
          type: text-generation
          name: Text Generation / Reasoning
        dataset:
          name: GSM8K (clean eval)
          type: gsm8k
        metrics:
          - type: exact_match
            name: GSM8K pass@1
            value: null
            verified: false
      - task:
          type: text-generation
          name: Code Generation (HumanEval)
        dataset:
          name: HumanEval (clean eval)
          type: openai_humaneval
        metrics:
          - type: pass@1
            name: HumanEval pass@1
            value: null
            verified: false

Oracle850B-MoE β€” Mixture of Experts Language Model

Hugging Face GitHub Release License CI Status Model Size Architecture

Project: Oracle β€” a line of proprietary reasoning LLMs by M∞1 Corporation Model: Oracle850B-MoE (850B parameters, Mixture of Experts - 128 experts) Author: MagistrTheOne|Krasnodar|2025 Repository: MagistrTheOne/oracle850b-moe

Oracle850B-MoE β€” M∞1's proprietary architecture with a total volume of β‰ˆ850B parameters (128 experts, top-k=2, active β‰ˆ180–220B). OWN MODEL / NO EXTERNAL CHECKPOINTS. Data/infrastructure/config preparation; training is launched on an external cluster.

πŸ”’ Strict Rules

  1. NO LOCAL TRAIN. ALLOW_LOCAL_TRAIN=false β€” any train run fails with a prompt.
  2. NO EXTERNAL WEIGHTS. Links/downloads to GPT-2/LLaMA/Mistral/Qwen/Phi/Gemma/OPT etc. are prohibited. CI guard is mandatory.
  3. Preparation only: code, configs, mock artifacts, dry-run; mini-samples for pipeline verification.
  4. Identity: special tokens <|oracle_sys|>, <|oracle_intro|>, <|author|>, auto-injection in serve.

πŸ—οΈ Architecture

MoE-850B Configuration

{
  "model_name": "oracle850b-moe",
  "arch": "decoder-only",
  "param_total": 850000000000,
  "moe": {
    "experts": 128,
    "expert_hidden": 2816,
    "router": {"type": "topk", "k": 2, "load_balancing_loss": 0.01}
  },
  "dense": {"d_model": 8192, "n_layers": 96, "n_heads": 64, "d_ff": 24576},
  "activation": "swiglu",
  "rope_theta": 10000,
  "rotary_pct": 0.5,
  "rmsnorm_eps": 1e-5,
  "flash_attn": true,
  "kv_cache": true,
  "vocab_size": 131072,
  "max_seq_len": 16384,
  "fp": {"train": "bf16", "infer": "auto"}
}

Explanation: total number of parameters β‰ˆ850B due to the expert pool; 2 experts are active per token β†’ "active parameters" ~180–220B. This gives 200B-class quality with fewer FLOPs.

Special Tokens

  • <|oracle_sys|> β€” Oracle system token
  • <|oracle_intro|> β€” Oracle introductory token
  • <|author|> β€” author token (MagistrTheOne|Krasnodar|2025|850B)
  • <|endoftext|> β€” end of text
  • <|pad|> β€” padding
  • <|unk|> β€” unknown token

πŸ“Š TB-Scale Data Pipeline

Pipeline Structure

obj://oracle-data/raw/...          # Source data
    ↓ ingest.py
obj://oracle-data/clean/...        # Cleaned data
    ↓ clean_generic.py
obj://oracle-data/decontaminated/... # Decontaminated data
    ↓ decontaminate.py
obj://oracle-data/webdataset/...   # WebDataset shards
    ↓ shard_webdataset.py
obj://oracle-data/stats/...        # Statistics and reports
    ↓ stats.py

Processing Scripts

  • ingest.py β€” intake from S3/HTTPS; JSON manifest (source, license, size, hashes)
  • clean_generic.py β€” unicode normalization, dedup (MinHash/LSH), language (ru/en), PII, toxicity
  • decontaminate.py β€” evaluation stop-lists; intersection reports
  • shard_webdataset.py β€” packaging into tar-shards (e.g., 512MB), .idx index, map-style
  • stats.py β€” summaries (duplicates, languages, topics, lengths)

πŸš€ Training: Parallelism and Checkpoints

Training Configuration

seq_len: 16384
micro_bsz: 1
global_bsz: 4096
grad_accum: 512
precision: bf16
parallelism:
  tensor: 16     # TP
  pipeline: 12   # PP (stages)
  sequence: true # SP (ops sharding)
moe:
  top_k: 2
  capacity_factor: 1.25
  zloss: 0.001
opt: adamw
lr: 8e-5
warmup_steps: 8000
max_steps: 800000
checkpoint:
  every_steps: 1000
  keep_last: 3
  s3_mirror: true
logging: json

Launcher Requirements

  • Support for TP/PP/SP mapping across nodes/GPU (16Γ—TP, 12Γ—PP)
  • Elastic restart, automatic resume from the last fully loaded checkpoint
  • Dry-run: verify layout without starting math

☁️ Cloud Orchestration

Terraform (Yandex Cloud)

  • VPC, Object Storage, Container Registry
  • Kubernetes cluster with GPU nodes
  • Budget constraints and alerts
  • Monitoring and logging

Helm Charts

  • Charts for training and serving
  • Resource configuration and tolerations
  • Service accounts and RBAC

Kill Switch

  • Emergency stop of all pipelines
  • Terraform resource destruction
  • Pre-flight checks

πŸ›‘οΈ CI/CD and Guards

CI Guards

  • guard_external_models.yml β€” fail on mentions of gpt2|llama|mistral|qwen|phi|gemma|opt
  • push_to_hub.yml β€” publish metadata to HF (Free/Pro via ENV)

Security Scripts

  • guard_no_local_train.py β€” blocks local training
  • kill_switch.py β€” emergency resource shutdown

πŸ“¦ Hugging Face Hub

Publishing Strategy

  • Today: push metadata (configs, tokenizer, README, MODEL_CARD)
  • Tomorrow (Pro): enable HF_HUB_ENABLE_HF_TRANSFER=1, multi-upload; weights β€” only after external training

Environment Variables

HUGGINGFACE_TOKEN=hf_***
HF_REPO=<user>/oracle850b-moe
HF_TIER=free   # switch to pro later
HF_HUB_ENABLE_HF_TRANSFER=0

πŸš€ Quick Start

1. Installation

# Clone the repository
git clone https://github.com/MagistrTheOne/oracle850b-moe.git
cd oracle850b-moe

# Create virtual environment
make venv
make install

# Set up environment variables
cp .env.example .env
# Edit .env with your values

2. Verification

# Run CI guards
make ci-guards

# Check project status
make status

# Run tests
make test

3. Data Preparation (dry-run)

# Run data pipeline preparation
make prep-tb

# Infrastructure planning
make infra-plan

4. Upload to HF Hub

# Upload metadata to Hugging Face Hub
make push-hf

πŸ“ Project Structure

oracle850b-moe/
β”œβ”€ src/oracle/core/
β”‚  β”œβ”€ modeling/          # MoE architecture
β”‚  β”œβ”€ tokenization/     # Custom tokenizer
β”‚  └─ serve/            # FastAPI server
β”œβ”€ configs/
β”‚  β”œβ”€ model/            # Model configs
β”‚  β”œβ”€ training/         # Training configs
β”‚  β”œβ”€ deepspeed/        # DeepSpeed configs
β”‚  └─ serve/           # Serving configs
β”œβ”€ datasets/scripts/    # Data processing scripts
β”œβ”€ training/           # Launcher and scheduler
β”œβ”€ infra/
β”‚  β”œβ”€ terraform/       # Yandex Cloud infrastructure
β”‚  β”œβ”€ helm/           # Kubernetes charts
β”‚  └─ scripts/         # Management scripts
β”œβ”€ ci/                # CI/CD pipelines
β”œβ”€ scripts/           # Utilities and uploads
└─ checkpoints/       # Checkpoints and prompts

πŸ”§ Makefile Commands

make help          # Show help
make prep-tb       # Run data pipeline (dry-run)
make infra-plan    # Infrastructure planning
make ci-guards     # Run CI guards
make test          # Run tests
make clean         # Clean temporary files
make kill-all      # Emergency shutdown
make push-hf       # Upload to HF Hub

⚠️ Limitations

  • Local training prohibited β€” only cluster training
  • External models prohibited β€” only proprietary architecture
  • Python 3.11.9 β€” fixed dependency versions
  • Virtual environment β€” dependency isolation

πŸ“ž Support

πŸ“„ License

[License to be determined]


Disclaimer: Oracle850B is an experimental model. Use at your own risk. The author is not responsible for any consequences of use.