|
--- |
|
language: |
|
- en |
|
- ru |
|
license: other |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: [moe, transformer, decoder-only, flash-attn, rope, rmsnorm, reasoning, long-context, vllm, oracle850b, m-infinity-1] |
|
widget: |
|
- text: "<|oracle_sys|>...\n<|oracle_intro|>I am Oracle. Author: MagistrTheOne|Krasnodar|2025.\n<|user|>who are you?\n<|assistant|>" |
|
model-index: |
|
- name: oracle850b-moe |
|
results: |
|
- task: {type: text-generation, name: Text Generation / Reasoning} |
|
dataset: {name: GSM8K (clean eval), type: gsm8k} |
|
metrics: [{type: exact_match, name: GSM8K pass@1, value: null, verified: false}] |
|
- task: {type: text-generation, name: Code Generation (HumanEval)} |
|
dataset: {name: HumanEval (clean eval), type: openai_humaneval} |
|
metrics: [{type: pass@1, name: HumanEval pass@1, value: null, verified: false}] |
|
--- |
|
|
|
# Oracle850B-MoE β Mixture of Experts Language Model |
|
|
|
[](https://huggingface.co/MagistrTheOne/oracle850b-moe) |
|
[](https://github.com/MagistrTheOne/oracle850b-moe/releases) |
|
[](LICENSE) |
|
[](ci/guard_external_models.yml) |
|
[](#model-architecture) |
|
[](#model-architecture) |
|
|
|
**Project:** Oracle β a line of proprietary reasoning LLMs by Mβ1 Corporation |
|
**Model:** `Oracle850B-MoE` (850B parameters, **Mixture of Experts** - 128 experts) |
|
**Author:** `MagistrTheOne|Krasnodar|2025` |
|
**Repository:** [MagistrTheOne/oracle850b-moe](https://github.com/MagistrTheOne/oracle850b-moe) |
|
|
|
> **Oracle850B-MoE** β Mβ1's proprietary architecture with a total volume of β850B parameters (128 experts, top-k=2, active β180β220B). **OWN MODEL / NO EXTERNAL CHECKPOINTS**. Data/infrastructure/config preparation; training is launched on an external cluster. |
|
|
|
## π Strict Rules |
|
|
|
1. **NO LOCAL TRAIN**. `ALLOW_LOCAL_TRAIN=false` β any train run fails with a prompt. |
|
2. **NO EXTERNAL WEIGHTS**. Links/downloads to GPT-2/LLaMA/Mistral/Qwen/Phi/Gemma/OPT etc. are prohibited. CI guard is mandatory. |
|
3. **Preparation only**: code, configs, mock artifacts, dry-run; mini-samples for pipeline verification. |
|
4. **Identity**: special tokens `<|oracle_sys|>`, `<|oracle_intro|>`, `<|author|>`, auto-injection in serve. |
|
|
|
## ποΈ Architecture |
|
|
|
### MoE-850B Configuration |
|
|
|
```json |
|
{ |
|
"model_name": "oracle850b-moe", |
|
"arch": "decoder-only", |
|
"param_total": 850000000000, |
|
"moe": { |
|
"experts": 128, |
|
"expert_hidden": 2816, |
|
"router": {"type": "topk", "k": 2, "load_balancing_loss": 0.01} |
|
}, |
|
"dense": {"d_model": 8192, "n_layers": 96, "n_heads": 64, "d_ff": 24576}, |
|
"activation": "swiglu", |
|
"rope_theta": 10000, |
|
"rotary_pct": 0.5, |
|
"rmsnorm_eps": 1e-5, |
|
"flash_attn": true, |
|
"kv_cache": true, |
|
"vocab_size": 131072, |
|
"max_seq_len": 16384, |
|
"fp": {"train": "bf16", "infer": "auto"} |
|
} |
|
``` |
|
|
|
**Explanation:** total number of parameters β850B due to the expert pool; 2 experts are active per token β "active parameters" ~180β220B. This gives 200B-class quality with fewer FLOPs. |
|
|
|
### Special Tokens |
|
|
|
- `<|oracle_sys|>` β Oracle system token |
|
- `<|oracle_intro|>` β Oracle introductory token |
|
- `<|author|>` β author token (MagistrTheOne|Krasnodar|2025|850B) |
|
- `<|endoftext|>` β end of text |
|
- `<|pad|>` β padding |
|
- `<|unk|>` β unknown token |
|
|
|
## π TB-Scale Data Pipeline |
|
|
|
### Pipeline Structure |
|
|
|
``` |
|
obj://oracle-data/raw/... # Source data |
|
β ingest.py |
|
obj://oracle-data/clean/... # Cleaned data |
|
β clean_generic.py |
|
obj://oracle-data/decontaminated/... # Decontaminated data |
|
β decontaminate.py |
|
obj://oracle-data/webdataset/... # WebDataset shards |
|
β shard_webdataset.py |
|
obj://oracle-data/stats/... # Statistics and reports |
|
β stats.py |
|
``` |
|
|
|
### Processing Scripts |
|
|
|
- **`ingest.py`** β intake from S3/HTTPS; JSON manifest (source, license, size, hashes) |
|
- **`clean_generic.py`** β unicode normalization, dedup (MinHash/LSH), language (ru/en), PII, toxicity |
|
- **`decontaminate.py`** β evaluation stop-lists; intersection reports |
|
- **`shard_webdataset.py`** β packaging into tar-shards (e.g., 512MB), `.idx` index, map-style |
|
- **`stats.py`** β summaries (duplicates, languages, topics, lengths) |
|
|
|
## π Training: Parallelism and Checkpoints |
|
|
|
### Training Configuration |
|
|
|
```yaml |
|
seq_len: 16384 |
|
micro_bsz: 1 |
|
global_bsz: 4096 |
|
grad_accum: 512 |
|
precision: bf16 |
|
parallelism: |
|
tensor: 16 # TP |
|
pipeline: 12 # PP (stages) |
|
sequence: true # SP (ops sharding) |
|
moe: |
|
top_k: 2 |
|
capacity_factor: 1.25 |
|
zloss: 0.001 |
|
opt: adamw |
|
lr: 8e-5 |
|
warmup_steps: 8000 |
|
max_steps: 800000 |
|
checkpoint: |
|
every_steps: 1000 |
|
keep_last: 3 |
|
s3_mirror: true |
|
logging: json |
|
``` |
|
|
|
### Launcher Requirements |
|
|
|
- Support for **TP/PP/SP** mapping across nodes/GPU (16ΓTP, 12ΓPP) |
|
- **Elastic** restart, automatic resume from the last fully loaded checkpoint |
|
- Dry-run: verify layout without starting math |
|
|
|
## βοΈ Cloud Orchestration |
|
|
|
### Terraform (Yandex Cloud) |
|
|
|
- VPC, Object Storage, Container Registry |
|
- Kubernetes cluster with GPU nodes |
|
- Budget constraints and alerts |
|
- Monitoring and logging |
|
|
|
### Helm Charts |
|
|
|
- Charts for training and serving |
|
- Resource configuration and tolerations |
|
- Service accounts and RBAC |
|
|
|
### Kill Switch |
|
|
|
- Emergency stop of all pipelines |
|
- Terraform resource destruction |
|
- Pre-flight checks |
|
|
|
## π‘οΈ CI/CD and Guards |
|
|
|
### CI Guards |
|
|
|
- **`guard_external_models.yml`** β fail on mentions of `gpt2|llama|mistral|qwen|phi|gemma|opt` |
|
- **`push_to_hub.yml`** β publish metadata to HF (Free/Pro via ENV) |
|
|
|
### Security Scripts |
|
|
|
- **`guard_no_local_train.py`** β blocks local training |
|
- **`kill_switch.py`** β emergency resource shutdown |
|
|
|
## π¦ Hugging Face Hub |
|
|
|
### Publishing Strategy |
|
|
|
- **Today**: push **metadata** (configs, tokenizer, README, MODEL_CARD) |
|
- **Tomorrow (Pro)**: enable `HF_HUB_ENABLE_HF_TRANSFER=1`, multi-upload; weights β only after external training |
|
|
|
### Environment Variables |
|
|
|
```bash |
|
HUGGINGFACE_TOKEN=hf_*** |
|
HF_REPO=<user>/oracle850b-moe |
|
HF_TIER=free # switch to pro later |
|
HF_HUB_ENABLE_HF_TRANSFER=0 |
|
``` |
|
|
|
## π Quick Start |
|
|
|
### 1. Installation |
|
|
|
```bash |
|
# Clone the repository |
|
git clone https://github.com/MagistrTheOne/oracle850b-moe.git |
|
cd oracle850b-moe |
|
|
|
# Create virtual environment |
|
make venv |
|
make install |
|
|
|
# Set up environment variables |
|
cp .env.example .env |
|
# Edit .env with your values |
|
``` |
|
|
|
### 2. Verification |
|
|
|
```bash |
|
# Run CI guards |
|
make ci-guards |
|
|
|
# Check project status |
|
make status |
|
|
|
# Run tests |
|
make test |
|
``` |
|
|
|
### 3. Data Preparation (dry-run) |
|
|
|
```bash |
|
# Run data pipeline preparation |
|
make prep-tb |
|
|
|
# Infrastructure planning |
|
make infra-plan |
|
``` |
|
|
|
### 4. Upload to HF Hub |
|
|
|
```bash |
|
# Upload metadata to Hugging Face Hub |
|
make push-hf |
|
``` |
|
|
|
## π Project Structure |
|
|
|
``` |
|
oracle850b-moe/ |
|
ββ src/oracle/core/ |
|
β ββ modeling/ # MoE architecture |
|
β ββ tokenization/ # Custom tokenizer |
|
β ββ serve/ # FastAPI server |
|
ββ configs/ |
|
β ββ model/ # Model configs |
|
β ββ training/ # Training configs |
|
β ββ deepspeed/ # DeepSpeed configs |
|
β ββ serve/ # Serving configs |
|
ββ datasets/scripts/ # Data processing scripts |
|
ββ training/ # Launcher and scheduler |
|
ββ infra/ |
|
β ββ terraform/ # Yandex Cloud infrastructure |
|
β ββ helm/ # Kubernetes charts |
|
β ββ scripts/ # Management scripts |
|
ββ ci/ # CI/CD pipelines |
|
ββ scripts/ # Utilities and uploads |
|
ββ checkpoints/ # Checkpoints and prompts |
|
``` |
|
|
|
## π§ Makefile Commands |
|
|
|
```bash |
|
make help # Show help |
|
make prep-tb # Run data pipeline (dry-run) |
|
make infra-plan # Infrastructure planning |
|
make ci-guards # Run CI guards |
|
make test # Run tests |
|
make clean # Clean temporary files |
|
make kill-all # Emergency shutdown |
|
make push-hf # Upload to HF Hub |
|
``` |
|
|
|
## β οΈ Limitations |
|
|
|
- **Local training prohibited** β only cluster training |
|
- **External models prohibited** β only proprietary architecture |
|
- **Python 3.11.9** β fixed dependency versions |
|
- **Virtual environment** β dependency isolation |
|
|
|
## π Support |
|
|
|
- **Author**: MagistrTheOne|Krasnodar|2025|850B |
|
- **Repository**: https://github.com/MagistrTheOne/oracle850b-moe |
|
- **HF Hub**: https://huggingface.co/MagistrTheOne/oracle850b-moe |
|
|
|
## π License |
|
|
|
[License to be determined] |
|
|
|
--- |
|
|
|
> **Disclaimer**: Oracle850B is an experimental model. Use at your own risk. The author is not responsible for any consequences of use. |
|
|