Arctic-LSTM-Speculator-HyperCLOVAX-SEED-Think-14B

Instruction

This repo contains LSTM-Speculator files for HyperCLOVAX-SEED-Think-14B.

Arctic-LSTM-Speculator-HyperCLOVAX-SEED-Think-14B was trained using ArcticTraining 0.6.0, following the guide.

Model Configuration

Original model: naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
Speculator: Arctic-LSTM-Speculator-HyperCLOVAX-SEED-Think-14B

Quickstart

pip install arctic-inference[vllm]

python3 -m vllm.entrypoints.openai.api_server --model=naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --port=8000 --speculative-config='{"method": "arctic","model": "K-Compression/Arctic-LSTM-Speculator-HyperCLOVAX-SEED-Think-14B"}'

Performance

We compare the output token throughput (tokens/s) of vLLM-based standard decoding and speculative decoding for HyperCLOVA X SEED 14B Think on a single H100 GPU as shown below:

HyperCLOVA X SEED 14B Think	ShareGPT (tokens/s)
No speculation	84.40
Arctic Speculator	115.94((1.4x faster))

License

The model is licensed under HyperCLOVA X SEED Model License Agreement