README.md · MagistrTheOne/oracle850b-moe at main

oracle850b-moe / README.md

MagistrTheOne

Upload README.md with huggingface_hub

24f0741 verified 16 days ago

preview code

raw

history blame contribute delete

8.99 kB

	---
	language:
	- en
	- ru
	license: other
	library_name: transformers
	pipeline_tag: text-generation
	inference: false
	tags: [moe, transformer, decoder-only, flash-attn, rope, rmsnorm, reasoning, long-context, vllm, oracle850b, m-infinity-1]
	widget:
	- text: "<\|oracle_sys\|>...\n<\|oracle_intro\|>I am Oracle. Author: MagistrTheOne\|Krasnodar\|2025.\n<\|user\|>who are you?\n<\|assistant\|>"
	model-index:
	- name: oracle850b-moe
	results:
	- task: {type: text-generation, name: Text Generation / Reasoning}
	dataset: {name: GSM8K (clean eval), type: gsm8k}
	metrics: [{type: exact_match, name: GSM8K pass@1, value: null, verified: false}]
	- task: {type: text-generation, name: Code Generation (HumanEval)}
	dataset: {name: HumanEval (clean eval), type: openai_humaneval}
	metrics: [{type: pass@1, name: HumanEval pass@1, value: null, verified: false}]
	---

	# Oracle850B-MoE — Mixture of Experts Language Model

	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-MagistrTheOne%2Foracle850b--moe-blue)](https://huggingface.co/MagistrTheOne/oracle850b-moe)
	[![GitHub Release](https://img.shields.io/badge/GitHub-Release-blue.svg)](https://github.com/MagistrTheOne/oracle850b-moe/releases)
	[![License](https://img.shields.io/badge/License-Proprietary%20Research-red.svg)](LICENSE)
	[![CI Status](https://img.shields.io/badge/CI-Guard%20External%20Models-green.svg)](ci/guard_external_models.yml)
	[![Model Size](https://img.shields.io/badge/Parameters-850B-orange.svg)](#model-architecture)
	[![Architecture](https://img.shields.io/badge/Architecture-MoE%20Transformer-blue.svg)](#model-architecture)

	Project: Oracle — a line of proprietary reasoning LLMs by M∞1 Corporation
	Model: `Oracle850B-MoE` (850B parameters, Mixture of Experts - 128 experts)
	Author: `MagistrTheOne\|Krasnodar\|2025`
	Repository: [MagistrTheOne/oracle850b-moe](https://github.com/MagistrTheOne/oracle850b-moe)

	> Oracle850B-MoE — M∞1's proprietary architecture with a total volume of ≈850B parameters (128 experts, top-k=2, active ≈180–220B). OWN MODEL / NO EXTERNAL CHECKPOINTS. Data/infrastructure/config preparation; training is launched on an external cluster.

	## 🔒 Strict Rules

	1. NO LOCAL TRAIN. `ALLOW_LOCAL_TRAIN=false` — any train run fails with a prompt.
	2. NO EXTERNAL WEIGHTS. Links/downloads to GPT-2/LLaMA/Mistral/Qwen/Phi/Gemma/OPT etc. are prohibited. CI guard is mandatory.
	3. Preparation only: code, configs, mock artifacts, dry-run; mini-samples for pipeline verification.
	4. Identity: special tokens `<\|oracle_sys\|>`, `<\|oracle_intro\|>`, `<\|author\|>`, auto-injection in serve.

	## 🏗️ Architecture

	### MoE-850B Configuration

	```json
	{
	"model_name": "oracle850b-moe",
	"arch": "decoder-only",
	"param_total": 850000000000,
	"moe": {
	"experts": 128,
	"expert_hidden": 2816,
	"router": {"type": "topk", "k": 2, "load_balancing_loss": 0.01}
	},
	"dense": {"d_model": 8192, "n_layers": 96, "n_heads": 64, "d_ff": 24576},
	"activation": "swiglu",
	"rope_theta": 10000,
	"rotary_pct": 0.5,
	"rmsnorm_eps": 1e-5,
	"flash_attn": true,
	"kv_cache": true,
	"vocab_size": 131072,
	"max_seq_len": 16384,
	"fp": {"train": "bf16", "infer": "auto"}
	}
	```

	Explanation: total number of parameters ≈850B due to the expert pool; 2 experts are active per token → "active parameters" ~180–220B. This gives 200B-class quality with fewer FLOPs.

	### Special Tokens

	- `<\|oracle_sys\|>` — Oracle system token
	- `<\|oracle_intro\|>` — Oracle introductory token
	- `<\|author\|>` — author token (MagistrTheOne\|Krasnodar\|2025\|850B)
	- `<\|endoftext\|>` — end of text
	- `<\|pad\|>` — padding
	- `<\|unk\|>` — unknown token

	## 📊 TB-Scale Data Pipeline

	### Pipeline Structure

	```
	obj://oracle-data/raw/... # Source data
	↓ ingest.py
	obj://oracle-data/clean/... # Cleaned data
	↓ clean_generic.py
	obj://oracle-data/decontaminated/... # Decontaminated data
	↓ decontaminate.py
	obj://oracle-data/webdataset/... # WebDataset shards
	↓ shard_webdataset.py
	obj://oracle-data/stats/... # Statistics and reports
	↓ stats.py
	```

	### Processing Scripts

	- `ingest.py` — intake from S3/HTTPS; JSON manifest (source, license, size, hashes)
	- `clean_generic.py` — unicode normalization, dedup (MinHash/LSH), language (ru/en), PII, toxicity
	- `decontaminate.py` — evaluation stop-lists; intersection reports
	- `shard_webdataset.py` — packaging into tar-shards (e.g., 512MB), `.idx` index, map-style
	- `stats.py` — summaries (duplicates, languages, topics, lengths)

	## 🚀 Training: Parallelism and Checkpoints

	### Training Configuration

	```yaml
	seq_len: 16384
	micro_bsz: 1
	global_bsz: 4096
	grad_accum: 512
	precision: bf16
	parallelism:
	tensor: 16 # TP
	pipeline: 12 # PP (stages)
	sequence: true # SP (ops sharding)
	moe:
	top_k: 2
	capacity_factor: 1.25
	zloss: 0.001
	opt: adamw
	lr: 8e-5
	warmup_steps: 8000
	max_steps: 800000
	checkpoint:
	every_steps: 1000
	keep_last: 3
	s3_mirror: true
	logging: json
	```

	### Launcher Requirements

	- Support for TP/PP/SP mapping across nodes/GPU (16×TP, 12×PP)
	- Elastic restart, automatic resume from the last fully loaded checkpoint
	- Dry-run: verify layout without starting math

	## ☁️ Cloud Orchestration

	### Terraform (Yandex Cloud)

	- VPC, Object Storage, Container Registry
	- Kubernetes cluster with GPU nodes
	- Budget constraints and alerts
	- Monitoring and logging

	### Helm Charts

	- Charts for training and serving
	- Resource configuration and tolerations
	- Service accounts and RBAC

	### Kill Switch

	- Emergency stop of all pipelines
	- Terraform resource destruction
	- Pre-flight checks

	## 🛡️ CI/CD and Guards

	### CI Guards

	- `guard_external_models.yml` — fail on mentions of `gpt2\|llama\|mistral\|qwen\|phi\|gemma\|opt`
	- `push_to_hub.yml` — publish metadata to HF (Free/Pro via ENV)

	### Security Scripts

	- `guard_no_local_train.py` — blocks local training
	- `kill_switch.py` — emergency resource shutdown

	## 📦 Hugging Face Hub

	### Publishing Strategy

	- Today: push metadata (configs, tokenizer, README, MODEL_CARD)
	- Tomorrow (Pro): enable `HF_HUB_ENABLE_HF_TRANSFER=1`, multi-upload; weights — only after external training

	### Environment Variables

	```bash
	HUGGINGFACE_TOKEN=hf_***
	HF_REPO=<user>/oracle850b-moe
	HF_TIER=free # switch to pro later
	HF_HUB_ENABLE_HF_TRANSFER=0
	```

	## 🚀 Quick Start

	### 1. Installation

	```bash
	# Clone the repository
	git clone https://github.com/MagistrTheOne/oracle850b-moe.git
	cd oracle850b-moe

	# Create virtual environment
	make venv
	make install

	# Set up environment variables
	cp .env.example .env
	# Edit .env with your values
	```

	### 2. Verification

	```bash
	# Run CI guards
	make ci-guards

	# Check project status
	make status

	# Run tests
	make test
	```

	### 3. Data Preparation (dry-run)

	```bash
	# Run data pipeline preparation
	make prep-tb

	# Infrastructure planning
	make infra-plan
	```

	### 4. Upload to HF Hub

	```bash
	# Upload metadata to Hugging Face Hub
	make push-hf
	```

	## 📁 Project Structure

	```
	oracle850b-moe/
	├─ src/oracle/core/
	│ ├─ modeling/ # MoE architecture
	│ ├─ tokenization/ # Custom tokenizer
	│ └─ serve/ # FastAPI server
	├─ configs/
	│ ├─ model/ # Model configs
	│ ├─ training/ # Training configs
	│ ├─ deepspeed/ # DeepSpeed configs
	│ └─ serve/ # Serving configs
	├─ datasets/scripts/ # Data processing scripts
	├─ training/ # Launcher and scheduler
	├─ infra/
	│ ├─ terraform/ # Yandex Cloud infrastructure
	│ ├─ helm/ # Kubernetes charts
	│ └─ scripts/ # Management scripts
	├─ ci/ # CI/CD pipelines
	├─ scripts/ # Utilities and uploads
	└─ checkpoints/ # Checkpoints and prompts
	```

	## 🔧 Makefile Commands

	```bash
	make help # Show help
	make prep-tb # Run data pipeline (dry-run)
	make infra-plan # Infrastructure planning
	make ci-guards # Run CI guards
	make test # Run tests
	make clean # Clean temporary files
	make kill-all # Emergency shutdown
	make push-hf # Upload to HF Hub
	```

	## ⚠️ Limitations

	- Local training prohibited — only cluster training
	- External models prohibited — only proprietary architecture
	- Python 3.11.9 — fixed dependency versions
	- Virtual environment — dependency isolation

	## 📞 Support

	- Author: MagistrTheOne\|Krasnodar\|2025\|850B
	- Repository: https://github.com/MagistrTheOne/oracle850b-moe
	- HF Hub: https://huggingface.co/MagistrTheOne/oracle850b-moe

	## 📄 License

	[License to be determined]

	---

	> Disclaimer: Oracle850B is an experimental model. Use at your own risk. The author is not responsible for any consequences of use.