|
# Eagle-3 Speculator for Llama-3.3-70B-Instruct |
|
|
|
This is an Eagle-3 speculator checkpoint converted to the [speculators](https://github.com/neuralmagic/speculators) format. |
|
|
|
## Model Details |
|
|
|
- **Base Model**: meta-llama/Llama-3.3-70B-Instruct |
|
- **Speculator Type**: Eagle-3 |
|
- **Draft Vocabulary Size**: 32,000 |
|
- **Target Vocabulary Size**: 128,256 |
|
- **Architecture**: Single-layer transformer with vocabulary mapping |
|
- **Target Model Hidden Size**: 8,192 |
|
- **Draft Model Hidden Size**: 6,144 |
|
|
|
## Key Features |
|
|
|
- **Vocabulary Mapping**: Maps between draft (32K) and target (128K) vocabularies |
|
- **Custom Attention**: Modified attention layer accepting 2×hidden_size input |
|
- **Fusion Layer**: Processes 3 verifier layers from target model (3×8192 → 6144) |
|
- **Optimized for 70B Models**: Specifically configured for Llama-3.3-70B architecture |
|
|
|
## Usage |
|
|
|
```python |
|
from speculators.models.eagle3 import Eagle3Speculator, Eagle3SpeculatorConfig |
|
from transformers import AutoModelForCausalLM |
|
|
|
# Load verifier model |
|
verifier = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct") |
|
|
|
# Load Eagle-3 speculator |
|
speculator = Eagle3Speculator.from_pretrained( |
|
"nm-testing/EAGLE3-LLaMA3.3-Instruct-70B-speculators", |
|
verifier=verifier |
|
) |
|
``` |
|
|
|
## Configuration |
|
|
|
This model uses the Eagle-3 architecture with: |
|
- Hidden size: 6,144 (draft model) |
|
- Target hidden size: 8,192 (70B Llama model) |
|
- Attention heads: 48 |
|
- Key-value heads: 8 |
|
- Intermediate size: 16,384 |
|
- RMS norm epsilon: 1e-05 |
|
|
|
## Original Model |
|
|
|
Converted from: [yuhuili/EAGLE3-LLaMA3.3-Instruct-70B](https://huggingface.co/yuhuili/EAGLE3-LLaMA3.3-Instruct-70B) |
|
|
|
## Citation |
|
|
|
Based on the Eagle-3 paper: https://arxiv.org/abs/2503.01840 |
|
|
|
## License |
|
|
|
Please refer to the base Llama-3.3 model license. |