Is this MTP head just for predicting one token ahead?

#1
by RonanMcGovern - opened

I see in the sglang lib this command:

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --speculative-algo NEXTN --speculative-draft SGLang/DeepSeek-V3-NextN --speculative-num-steps 2 --speculative-eagle-topk 4 --speculative-num-draft-tokens 4 --disable-radix --tp 8

but the DeepSeek v3 paper only trains the MTP for one token look ahead right? or am I mistaken? Thanks

Sign up or log in to comment