oracle850b-moe / README.md
MagistrTheOne's picture
Upload README.md with huggingface_hub
24f0741 verified
---
language:
- en
- ru
license: other
library_name: transformers
pipeline_tag: text-generation
inference: false
tags: [moe, transformer, decoder-only, flash-attn, rope, rmsnorm, reasoning, long-context, vllm, oracle850b, m-infinity-1]
widget:
- text: "<|oracle_sys|>...\n<|oracle_intro|>I am Oracle. Author: MagistrTheOne|Krasnodar|2025.\n<|user|>who are you?\n<|assistant|>"
model-index:
- name: oracle850b-moe
results:
- task: {type: text-generation, name: Text Generation / Reasoning}
dataset: {name: GSM8K (clean eval), type: gsm8k}
metrics: [{type: exact_match, name: GSM8K pass@1, value: null, verified: false}]
- task: {type: text-generation, name: Code Generation (HumanEval)}
dataset: {name: HumanEval (clean eval), type: openai_humaneval}
metrics: [{type: pass@1, name: HumanEval pass@1, value: null, verified: false}]
---
# Oracle850B-MoE β€” Mixture of Experts Language Model
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-MagistrTheOne%2Foracle850b--moe-blue)](https://huggingface.co/MagistrTheOne/oracle850b-moe)
[![GitHub Release](https://img.shields.io/badge/GitHub-Release-blue.svg)](https://github.com/MagistrTheOne/oracle850b-moe/releases)
[![License](https://img.shields.io/badge/License-Proprietary%20Research-red.svg)](LICENSE)
[![CI Status](https://img.shields.io/badge/CI-Guard%20External%20Models-green.svg)](ci/guard_external_models.yml)
[![Model Size](https://img.shields.io/badge/Parameters-850B-orange.svg)](#model-architecture)
[![Architecture](https://img.shields.io/badge/Architecture-MoE%20Transformer-blue.svg)](#model-architecture)
**Project:** Oracle β€” a line of proprietary reasoning LLMs by M∞1 Corporation
**Model:** `Oracle850B-MoE` (850B parameters, **Mixture of Experts** - 128 experts)
**Author:** `MagistrTheOne|Krasnodar|2025`
**Repository:** [MagistrTheOne/oracle850b-moe](https://github.com/MagistrTheOne/oracle850b-moe)
> **Oracle850B-MoE** β€” M∞1's proprietary architecture with a total volume of β‰ˆ850B parameters (128 experts, top-k=2, active β‰ˆ180–220B). **OWN MODEL / NO EXTERNAL CHECKPOINTS**. Data/infrastructure/config preparation; training is launched on an external cluster.
## πŸ”’ Strict Rules
1. **NO LOCAL TRAIN**. `ALLOW_LOCAL_TRAIN=false` β€” any train run fails with a prompt.
2. **NO EXTERNAL WEIGHTS**. Links/downloads to GPT-2/LLaMA/Mistral/Qwen/Phi/Gemma/OPT etc. are prohibited. CI guard is mandatory.
3. **Preparation only**: code, configs, mock artifacts, dry-run; mini-samples for pipeline verification.
4. **Identity**: special tokens `<|oracle_sys|>`, `<|oracle_intro|>`, `<|author|>`, auto-injection in serve.
## πŸ—οΈ Architecture
### MoE-850B Configuration
```json
{
"model_name": "oracle850b-moe",
"arch": "decoder-only",
"param_total": 850000000000,
"moe": {
"experts": 128,
"expert_hidden": 2816,
"router": {"type": "topk", "k": 2, "load_balancing_loss": 0.01}
},
"dense": {"d_model": 8192, "n_layers": 96, "n_heads": 64, "d_ff": 24576},
"activation": "swiglu",
"rope_theta": 10000,
"rotary_pct": 0.5,
"rmsnorm_eps": 1e-5,
"flash_attn": true,
"kv_cache": true,
"vocab_size": 131072,
"max_seq_len": 16384,
"fp": {"train": "bf16", "infer": "auto"}
}
```
**Explanation:** total number of parameters β‰ˆ850B due to the expert pool; 2 experts are active per token β†’ "active parameters" ~180–220B. This gives 200B-class quality with fewer FLOPs.
### Special Tokens
- `<|oracle_sys|>` β€” Oracle system token
- `<|oracle_intro|>` β€” Oracle introductory token
- `<|author|>` β€” author token (MagistrTheOne|Krasnodar|2025|850B)
- `<|endoftext|>` β€” end of text
- `<|pad|>` β€” padding
- `<|unk|>` β€” unknown token
## πŸ“Š TB-Scale Data Pipeline
### Pipeline Structure
```
obj://oracle-data/raw/... # Source data
↓ ingest.py
obj://oracle-data/clean/... # Cleaned data
↓ clean_generic.py
obj://oracle-data/decontaminated/... # Decontaminated data
↓ decontaminate.py
obj://oracle-data/webdataset/... # WebDataset shards
↓ shard_webdataset.py
obj://oracle-data/stats/... # Statistics and reports
↓ stats.py
```
### Processing Scripts
- **`ingest.py`** β€” intake from S3/HTTPS; JSON manifest (source, license, size, hashes)
- **`clean_generic.py`** β€” unicode normalization, dedup (MinHash/LSH), language (ru/en), PII, toxicity
- **`decontaminate.py`** β€” evaluation stop-lists; intersection reports
- **`shard_webdataset.py`** β€” packaging into tar-shards (e.g., 512MB), `.idx` index, map-style
- **`stats.py`** β€” summaries (duplicates, languages, topics, lengths)
## πŸš€ Training: Parallelism and Checkpoints
### Training Configuration
```yaml
seq_len: 16384
micro_bsz: 1
global_bsz: 4096
grad_accum: 512
precision: bf16
parallelism:
tensor: 16 # TP
pipeline: 12 # PP (stages)
sequence: true # SP (ops sharding)
moe:
top_k: 2
capacity_factor: 1.25
zloss: 0.001
opt: adamw
lr: 8e-5
warmup_steps: 8000
max_steps: 800000
checkpoint:
every_steps: 1000
keep_last: 3
s3_mirror: true
logging: json
```
### Launcher Requirements
- Support for **TP/PP/SP** mapping across nodes/GPU (16Γ—TP, 12Γ—PP)
- **Elastic** restart, automatic resume from the last fully loaded checkpoint
- Dry-run: verify layout without starting math
## ☁️ Cloud Orchestration
### Terraform (Yandex Cloud)
- VPC, Object Storage, Container Registry
- Kubernetes cluster with GPU nodes
- Budget constraints and alerts
- Monitoring and logging
### Helm Charts
- Charts for training and serving
- Resource configuration and tolerations
- Service accounts and RBAC
### Kill Switch
- Emergency stop of all pipelines
- Terraform resource destruction
- Pre-flight checks
## πŸ›‘οΈ CI/CD and Guards
### CI Guards
- **`guard_external_models.yml`** β€” fail on mentions of `gpt2|llama|mistral|qwen|phi|gemma|opt`
- **`push_to_hub.yml`** β€” publish metadata to HF (Free/Pro via ENV)
### Security Scripts
- **`guard_no_local_train.py`** β€” blocks local training
- **`kill_switch.py`** β€” emergency resource shutdown
## πŸ“¦ Hugging Face Hub
### Publishing Strategy
- **Today**: push **metadata** (configs, tokenizer, README, MODEL_CARD)
- **Tomorrow (Pro)**: enable `HF_HUB_ENABLE_HF_TRANSFER=1`, multi-upload; weights β€” only after external training
### Environment Variables
```bash
HUGGINGFACE_TOKEN=hf_***
HF_REPO=<user>/oracle850b-moe
HF_TIER=free # switch to pro later
HF_HUB_ENABLE_HF_TRANSFER=0
```
## πŸš€ Quick Start
### 1. Installation
```bash
# Clone the repository
git clone https://github.com/MagistrTheOne/oracle850b-moe.git
cd oracle850b-moe
# Create virtual environment
make venv
make install
# Set up environment variables
cp .env.example .env
# Edit .env with your values
```
### 2. Verification
```bash
# Run CI guards
make ci-guards
# Check project status
make status
# Run tests
make test
```
### 3. Data Preparation (dry-run)
```bash
# Run data pipeline preparation
make prep-tb
# Infrastructure planning
make infra-plan
```
### 4. Upload to HF Hub
```bash
# Upload metadata to Hugging Face Hub
make push-hf
```
## πŸ“ Project Structure
```
oracle850b-moe/
β”œβ”€ src/oracle/core/
β”‚ β”œβ”€ modeling/ # MoE architecture
β”‚ β”œβ”€ tokenization/ # Custom tokenizer
β”‚ └─ serve/ # FastAPI server
β”œβ”€ configs/
β”‚ β”œβ”€ model/ # Model configs
β”‚ β”œβ”€ training/ # Training configs
β”‚ β”œβ”€ deepspeed/ # DeepSpeed configs
β”‚ └─ serve/ # Serving configs
β”œβ”€ datasets/scripts/ # Data processing scripts
β”œβ”€ training/ # Launcher and scheduler
β”œβ”€ infra/
β”‚ β”œβ”€ terraform/ # Yandex Cloud infrastructure
β”‚ β”œβ”€ helm/ # Kubernetes charts
β”‚ └─ scripts/ # Management scripts
β”œβ”€ ci/ # CI/CD pipelines
β”œβ”€ scripts/ # Utilities and uploads
└─ checkpoints/ # Checkpoints and prompts
```
## πŸ”§ Makefile Commands
```bash
make help # Show help
make prep-tb # Run data pipeline (dry-run)
make infra-plan # Infrastructure planning
make ci-guards # Run CI guards
make test # Run tests
make clean # Clean temporary files
make kill-all # Emergency shutdown
make push-hf # Upload to HF Hub
```
## ⚠️ Limitations
- **Local training prohibited** β€” only cluster training
- **External models prohibited** β€” only proprietary architecture
- **Python 3.11.9** β€” fixed dependency versions
- **Virtual environment** β€” dependency isolation
## πŸ“ž Support
- **Author**: MagistrTheOne|Krasnodar|2025|850B
- **Repository**: https://github.com/MagistrTheOne/oracle850b-moe
- **HF Hub**: https://huggingface.co/MagistrTheOne/oracle850b-moe
## πŸ“„ License
[License to be determined]
---
> **Disclaimer**: Oracle850B is an experimental model. Use at your own risk. The author is not responsible for any consequences of use.