Duplicate from nm-testing/EAGLE3-LLaMA3.3-Instruct-70B-speculators

57c904c verified 13 days ago

1.78 kB

	# Eagle-3 Speculator for Llama-3.3-70B-Instruct

	This is an Eagle-3 speculator checkpoint converted to the [speculators](https://github.com/neuralmagic/speculators) format.

	## Model Details

	- Base Model: meta-llama/Llama-3.3-70B-Instruct
	- Speculator Type: Eagle-3
	- Draft Vocabulary Size: 32,000
	- Target Vocabulary Size: 128,256
	- Architecture: Single-layer transformer with vocabulary mapping
	- Target Model Hidden Size: 8,192
	- Draft Model Hidden Size: 6,144

	## Key Features

	- Vocabulary Mapping: Maps between draft (32K) and target (128K) vocabularies
	- Custom Attention: Modified attention layer accepting 2×hidden_size input
	- Fusion Layer: Processes 3 verifier layers from target model (3×8192 → 6144)
	- Optimized for 70B Models: Specifically configured for Llama-3.3-70B architecture

	## Usage

	```python
	from speculators.models.eagle3 import Eagle3Speculator, Eagle3SpeculatorConfig
	from transformers import AutoModelForCausalLM

	# Load verifier model
	verifier = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")

	# Load Eagle-3 speculator
	speculator = Eagle3Speculator.from_pretrained(
	"nm-testing/EAGLE3-LLaMA3.3-Instruct-70B-speculators",
	verifier=verifier
	)
	```

	## Configuration

	This model uses the Eagle-3 architecture with:
	- Hidden size: 6,144 (draft model)
	- Target hidden size: 8,192 (70B Llama model)
	- Attention heads: 48
	- Key-value heads: 8
	- Intermediate size: 16,384
	- RMS norm epsilon: 1e-05

	## Original Model

	Converted from: [yuhuili/EAGLE3-LLaMA3.3-Instruct-70B](https://huggingface.co/yuhuili/EAGLE3-LLaMA3.3-Instruct-70B)

	## Citation

	Based on the Eagle-3 paper: https://arxiv.org/abs/2503.01840

	## License

	Please refer to the base Llama-3.3 model license.