oracle850b-moe / README.md

MagistrTheOne

Upload README.md with huggingface_hub

24f0741 verified 16 days ago

preview code

raw

history blame contribute delete

8.99 kB

metadata

language:
  - en
  - ru
license: other
library_name: transformers
pipeline_tag: text-generation
inference: false
tags:
  - moe
  - transformer
  - decoder-only
  - flash-attn
  - rope
  - rmsnorm
  - reasoning
  - long-context
  - vllm
  - oracle850b
  - m-infinity-1
widget:
  - text: |-
      <|oracle_sys|>...
      <|oracle_intro|>I am Oracle. Author: MagistrTheOne|Krasnodar|2025.
      <|user|>who are you?
      <|assistant|>
model-index:
  - name: oracle850b-moe
    results:
      - task:
          type: text-generation
          name: Text Generation / Reasoning
        dataset:
          name: GSM8K (clean eval)
          type: gsm8k
        metrics:
          - type: exact_match
            name: GSM8K pass@1
            value: null
            verified: false
      - task:
          type: text-generation
          name: Code Generation (HumanEval)
        dataset:
          name: HumanEval (clean eval)
          type: openai_humaneval
        metrics:
          - type: pass@1
            name: HumanEval pass@1
            value: null
            verified: false

Oracle850B-MoE — Mixture of Experts Language Model

Project: Oracle — a line of proprietary reasoning LLMs by M∞1 Corporation Model: Oracle850B-MoE (850B parameters, Mixture of Experts - 128 experts) Author: MagistrTheOne|Krasnodar|2025 Repository: MagistrTheOne/oracle850b-moe

Oracle850B-MoE — M∞1's proprietary architecture with a total volume of ≈850B parameters (128 experts, top-k=2, active ≈180–220B). OWN MODEL / NO EXTERNAL CHECKPOINTS. Data/infrastructure/config preparation; training is launched on an external cluster.

🔒 Strict Rules

NO LOCAL TRAIN. ALLOW_LOCAL_TRAIN=false — any train run fails with a prompt.
NO EXTERNAL WEIGHTS. Links/downloads to GPT-2/LLaMA/Mistral/Qwen/Phi/Gemma/OPT etc. are prohibited. CI guard is mandatory.
Preparation only: code, configs, mock artifacts, dry-run; mini-samples for pipeline verification.
Identity: special tokens <|oracle_sys|>, <|oracle_intro|>, <|author|>, auto-injection in serve.

🏗️ Architecture

MoE-850B Configuration

{
  "model_name": "oracle850b-moe",
  "arch": "decoder-only",
  "param_total": 850000000000,
  "moe": {
    "experts": 128,
    "expert_hidden": 2816,
    "router": {"type": "topk", "k": 2, "load_balancing_loss": 0.01}
  },
  "dense": {"d_model": 8192, "n_layers": 96, "n_heads": 64, "d_ff": 24576},
  "activation": "swiglu",
  "rope_theta": 10000,
  "rotary_pct": 0.5,
  "rmsnorm_eps": 1e-5,
  "flash_attn": true,
  "kv_cache": true,
  "vocab_size": 131072,
  "max_seq_len": 16384,
  "fp": {"train": "bf16", "infer": "auto"}
}

Explanation: total number of parameters ≈850B due to the expert pool; 2 experts are active per token → "active parameters" ~180–220B. This gives 200B-class quality with fewer FLOPs.

Special Tokens

<|oracle_sys|> — Oracle system token
<|oracle_intro|> — Oracle introductory token
<|author|> — author token (MagistrTheOne|Krasnodar|2025|850B)
<|endoftext|> — end of text
<|pad|> — padding
<|unk|> — unknown token

📊 TB-Scale Data Pipeline

Pipeline Structure

obj://oracle-data/raw/...          # Source data
    ↓ ingest.py
obj://oracle-data/clean/...        # Cleaned data
    ↓ clean_generic.py
obj://oracle-data/decontaminated/... # Decontaminated data
    ↓ decontaminate.py
obj://oracle-data/webdataset/...   # WebDataset shards
    ↓ shard_webdataset.py
obj://oracle-data/stats/...        # Statistics and reports
    ↓ stats.py

Processing Scripts

ingest.py — intake from S3/HTTPS; JSON manifest (source, license, size, hashes)
clean_generic.py — unicode normalization, dedup (MinHash/LSH), language (ru/en), PII, toxicity
decontaminate.py — evaluation stop-lists; intersection reports
shard_webdataset.py — packaging into tar-shards (e.g., 512MB), .idx index, map-style
stats.py — summaries (duplicates, languages, topics, lengths)

🚀 Training: Parallelism and Checkpoints

Training Configuration

seq_len: 16384
micro_bsz: 1
global_bsz: 4096
grad_accum: 512
precision: bf16
parallelism:
  tensor: 16     # TP
  pipeline: 12   # PP (stages)
  sequence: true # SP (ops sharding)
moe:
  top_k: 2
  capacity_factor: 1.25
  zloss: 0.001
opt: adamw
lr: 8e-5
warmup_steps: 8000
max_steps: 800000
checkpoint:
  every_steps: 1000
  keep_last: 3
  s3_mirror: true
logging: json

Launcher Requirements

Support for TP/PP/SP mapping across nodes/GPU (16×TP, 12×PP)
Elastic restart, automatic resume from the last fully loaded checkpoint
Dry-run: verify layout without starting math

☁️ Cloud Orchestration

Terraform (Yandex Cloud)

VPC, Object Storage, Container Registry
Kubernetes cluster with GPU nodes
Budget constraints and alerts
Monitoring and logging

Helm Charts

Charts for training and serving
Resource configuration and tolerations
Service accounts and RBAC

Kill Switch

Emergency stop of all pipelines
Terraform resource destruction
Pre-flight checks

🛡️ CI/CD and Guards

CI Guards

guard_external_models.yml — fail on mentions of gpt2|llama|mistral|qwen|phi|gemma|opt
push_to_hub.yml — publish metadata to HF (Free/Pro via ENV)

Security Scripts

guard_no_local_train.py — blocks local training
kill_switch.py — emergency resource shutdown

📦 Hugging Face Hub

Publishing Strategy

Today: push metadata (configs, tokenizer, README, MODEL_CARD)
Tomorrow (Pro): enable HF_HUB_ENABLE_HF_TRANSFER=1, multi-upload; weights — only after external training

Environment Variables

HUGGINGFACE_TOKEN=hf_***
HF_REPO=<user>/oracle850b-moe
HF_TIER=free   # switch to pro later
HF_HUB_ENABLE_HF_TRANSFER=0

🚀 Quick Start

1. Installation

# Clone the repository
git clone https://github.com/MagistrTheOne/oracle850b-moe.git
cd oracle850b-moe

# Create virtual environment
make venv
make install

# Set up environment variables
cp .env.example .env
# Edit .env with your values

2. Verification

# Run CI guards
make ci-guards

# Check project status
make status

# Run tests
make test

3. Data Preparation (dry-run)

# Run data pipeline preparation
make prep-tb

# Infrastructure planning
make infra-plan

4. Upload to HF Hub

# Upload metadata to Hugging Face Hub
make push-hf

📁 Project Structure

oracle850b-moe/
├─ src/oracle/core/
│  ├─ modeling/          # MoE architecture
│  ├─ tokenization/     # Custom tokenizer
│  └─ serve/            # FastAPI server
├─ configs/
│  ├─ model/            # Model configs
│  ├─ training/         # Training configs
│  ├─ deepspeed/        # DeepSpeed configs
│  └─ serve/           # Serving configs
├─ datasets/scripts/    # Data processing scripts
├─ training/           # Launcher and scheduler
├─ infra/
│  ├─ terraform/       # Yandex Cloud infrastructure
│  ├─ helm/           # Kubernetes charts
│  └─ scripts/         # Management scripts
├─ ci/                # CI/CD pipelines
├─ scripts/           # Utilities and uploads
└─ checkpoints/       # Checkpoints and prompts

🔧 Makefile Commands

make help          # Show help
make prep-tb       # Run data pipeline (dry-run)
make infra-plan    # Infrastructure planning
make ci-guards     # Run CI guards
make test          # Run tests
make clean         # Clean temporary files
make kill-all      # Emergency shutdown
make push-hf       # Upload to HF Hub

⚠️ Limitations

Local training prohibited — only cluster training
External models prohibited — only proprietary architecture
Python 3.11.9 — fixed dependency versions
Virtual environment — dependency isolation

📞 Support

Author: MagistrTheOne|Krasnodar|2025|850B
Repository: https://github.com/MagistrTheOne/oracle850b-moe
HF Hub: https://huggingface.co/MagistrTheOne/oracle850b-moe

📄 License

[License to be determined]

Disclaimer: Oracle850B is an experimental model. Use at your own risk. The author is not responsible for any consequences of use.