Arctic-LSTM-Speculator-HyperCLOVAX-SEED-Think-14B

Instruction

This repo contains LSTM-Speculator files for HyperCLOVAX-SEED-Think-14B.

Arctic-LSTM-Speculator-HyperCLOVAX-SEED-Think-14B was trained using ArcticTraining 0.6.0, following the guide.

Model Configuration

Quickstart

pip install arctic-inference[vllm] 
python3 -m vllm.entrypoints.openai.api_server --model=naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --port=8000 --speculative-config='{"method": "arctic","model": "K-Compression/Arctic-LSTM-Speculator-HyperCLOVAX-SEED-Think-14B"}'

Performance

We compare the output token throughput (tokens/s) of vLLM-based standard decoding and speculative decoding for HyperCLOVA X SEED 14B Think on a single H100 GPU as shown below:

HyperCLOVA X SEED 14B Think ShareGPT (tokens/s)
No speculation 84.40
Arctic Speculator 115.94((1.4x faster))

License

The model is licensed under HyperCLOVA X SEED Model License Agreement

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for K-Compression/Arctic-LSTM-Speculator-HyperCLOVAX-SEED-Think-14B

Finetuned
(2)
this model