Clarification on the layer hidden state source

#1
by websterbei - opened

Eagle 3 default layer indices for this model would be 2,18,33
The config specified 1, 17, 32
Want to clarify, using layer indexing from 0 to 35 for this model, is it using input hidden states to layer 1,17,32 or output from layer 1,17,32?

NVIDIA org

That’s just the different notation of the layers. In SGLang the input hidden_states of the layers (2,18,33) are used, while in TRTLLM we use the output hidden_states of layers (1,17,32). Essentially they are the same

Got it, the hitrate seems to drop a lot with multi-gpu + trt-llm, any clue?

Sign up or log in to comment